Panda used in LBNE computing effort
Project description
LBNE stands for Long Baseline Neutrino Experiment and is a part of a
larger project with near detectors and neutrino beam at Fermilab far
detectors likely at the NSF DUSEL site in the Homestake Mine in South
Dakota. BNL is a participant in LBNE and houses the project office
for the Water Cherenkov far detector and is committed to providing
computing support for this effort. Currently the group has deployed 10
multicore machines integrated into RACF, and expects to grow that by
roughly 10 machines per year for the next 2 years or so.
Status report
LBNE/PanDA report 11/12/2010:
PowerPoint Slides
Hands-on Panda Demonstration
Job Submission
wget http://www.usatlas.bnl.gov/~caballer/panda/demo/sendjobs.tar
tar xvf sendjobs.tar
cd sendjobs
source setup.sh
./sendJob.py --njobs 4 --site TEST2 --joburl http://www.usatlas.bnl.gov/~caballer/panda/transformations/fake.py --label user --jobparameters "a b c 1 2 3"
Pilot Submission
wget wget http://www.usatlas.bnl.gov/~caballer/panda/demo/sendpilots.tar
tar xvf sendpilots.tar
cd sendpilots
source setup.sh
./pilotScheduler.py --queue=TEST2 --pandasite=TEST2 --pilot=default --single
Access to log files
Log files reside at /usatlas/prodjob/share/schedlogs/ and are currently served by an Apache instance on osgdev.racf.bnl.gov:23000
Technical notes
Machine details
The RACF machine names for Daya Bay and LBNE respectively are as follows:
- daya000N N=1,2,3
- lbne00NN N=01-10.
Condor Usage
Typical Condor submission file
# GAMMA_205_206_WET_DayaBay_Gamma
# --- begin basic.condor
#### djaffe Daya Bay condor script 19nov09
# description file commands
Universe = Vanilla
Getenv = True
notification = Error
notify_user = djaffe@bnl.gov
# Requirements = (CPU_Speed >= 1 && TotalDisk > 0 ) && (CPU_Experiment == "dayabay")
Requirements = (CPU_Speed >= 1 && TotalDisk > 0 )
Input = /dev/null
Rank = (State == "Unclaimed")
Image_Size = 100 Meg
+Job_Type = "cas"
+Experiment = "dayabay"
Output = temporary/condor/out/$(filename).out
Error = temporary/condor/err/$(filename).err
Log = temporary/condor/log/$(filename).log
# variables
jobsite = RACF
rootoutputdir = temporary/output
# ---- end basic.condor
Executable = /afs/rhic.bnl.gov/x8664_sl5/opt/dayabay/offline-opt/NuWa-trunk//dybgaudi/InstallArea/scripts/nuwa.py
Arguments = " -n 1000 -G /afs/rhic.bnl.gov/x8664_sl5/opt/dayabay/offline-opt/NuWa-trunk//dybgaudi/Detector/XmlDetDesc/DDDB/dayabay.xml -o $(rootoutputdir)/$(filename).root -R $(runnumber) -m 'SpadeSvc -S gamma' -m 'MDC09b.runGamma 6.' "
# definitions of variables
MDCtype = MDC09b
ADstate = WET
sourcetype =
macroname = $(MDCtype).run$(jobtype)
#
filename = MDC$(jobtype)_$(jobsite)_$(ADstate)_D$(runnumber)
runnumber = 205
runtime = 2008-01-01T02:00:02
Queue
OSG stack
To make use of X.509 components (which one needs to do in order to access Panda), parts of the OSG software stack are needed.
These can be accessed by sourcing one of these scripts:
/afs/usatlas.bnl.gov/osg/client/current/setup.sh
/afs/usatlas/osg/client/@sys/current/setup.sh
Archive: Meeting notes
OSG/LBNE meeting on 11/23/2009
Present: Jose Caballero, David Jaffe, Maxim Potekhin, Brett Viren
- Fact sheet:
- Currently, the software used by LBNE group has dependencies on GEANT4 and ROOT. In about a year, they'll migrate to Gaudi-based framework.
- Each machine maintained by LBNE at BNL has a total of 5.5TB of disk space spanning 8 drives, for grand total of 55TB.
- In the current stage of testing, jobs are submitted via Condor, as vanilla jobs
- Issues/questions:
- -------------------------------------
- Q: Do we need glexec to assure transparency of LBNE users identities?
- A: Not in the short and medium term. David is almost exclusively responsible for running production jobs, the user analysis will be addressed later.
- -------------------------------------
- Q: Does LBNE have a central file catalog?
- A: No, but there should be one.
- -------------------------------------
- Q: What shall be used to distribute data to jobs?
- A: LBNE is considering xrootd, with something else being used in the interim as a stopgap
- -------------------------------------
- Q: What do we use in short to medium term, to get going and gain experience with LBNE job execution in the Grid environment?
- A: We can possibly use DataHost to cache the data, with Pilots providing up/download of requisite files from our existing Datahost.
- -------------------------------------
- Action items
- LBNE Software packaging and distribution, ETA January 2010 (B.Viren). ROOT dependency needs to be clarified, and ROOT built if necessary.
- Need to find more about Condor policies governing access to the LBNE farm (need to get condor submission file from David, contact J.Hover and M.Ernst if necessary)
- Hands-on Panda tutorial scheduled on 12/10/2009 (J.Caballero)
- Follow-up with LBNE VO membership (M.Potekhin)
- Set up Datahost accounts for Brett and others (M.Potekhin)
- Set up Panda queues for LBNE (liaison: J.Caballero), start pilot submission.
Kick-off PANDA/PAS/LBNE meeting on 08/06/2009
- Plans to grow local computing resources to a few dozen boxes
- Michael Ernst will provide support under the aegis of RACF, on best effort basis
- Intention is to keep ALL data staged on disk at all times
Major updates:
--
TWikiAdminGroup - 23 May 2012
About This Site
Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.
Attachments
PAS_20101112_2.ppt (939.0K) |
MaximPotekhin, 11 Nov 2010 - 21:33 |
LBNE/PanDA report 20101112