r10 - 12 Nov 2010 - 10:47:37 - MaximPotekhinYou are here: TWiki >  AtlasSoftware Web > LBNE

Panda used in LBNE computing effort

Project description

LBNE stands for Long Baseline Neutrino Experiment and is a part of a larger project with near detectors and neutrino beam at Fermilab far detectors likely at the NSF DUSEL site in the Homestake Mine in South Dakota. BNL is a participant in LBNE and houses the project office for the Water Cherenkov far detector and is committed to providing computing support for this effort. Currently the group has deployed 10 multicore machines integrated into RACF, and expects to grow that by roughly 10 machines per year for the next 2 years or so.

Status report

LBNE/PanDA report 11/12/2010: PowerPoint Slides

Hands-on Panda Demonstration

Job Submission

wget http://www.usatlas.bnl.gov/~caballer/panda/demo/sendjobs.tar
tar xvf sendjobs.tar
cd sendjobs
source setup.sh
./sendJob.py --njobs 4 --site TEST2 --joburl http://www.usatlas.bnl.gov/~caballer/panda/transformations/fake.py --label user  --jobparameters "a b c 1 2 3"

Pilot Submission

wget wget http://www.usatlas.bnl.gov/~caballer/panda/demo/sendpilots.tar
tar xvf sendpilots.tar
cd sendpilots
source setup.sh
./pilotScheduler.py --queue=TEST2 --pandasite=TEST2 --pilot=default --single

Access to log files

Log files reside at /usatlas/prodjob/share/schedlogs/ and are currently served by an Apache instance on osgdev.racf.bnl.gov:23000

Technical notes

Machine details

The RACF machine names for Daya Bay and LBNE respectively are as follows:
  • daya000N N=1,2,3
  • lbne00NN N=01-10.

Condor Usage

Typical Condor submission file

# GAMMA_205_206_WET_DayaBay_Gamma
# --- begin basic.condor 
#### djaffe Daya Bay condor script 19nov09
# description file commands
Universe    = Vanilla
Getenv      = True
notification    = Error
notify_user    = djaffe@bnl.gov
# Requirements    = (CPU_Speed >= 1 && TotalDisk > 0 ) && (CPU_Experiment == "dayabay")
Requirements    = (CPU_Speed >= 1 && TotalDisk > 0 ) 
Input           = /dev/null
Rank      = (State == "Unclaimed")
Image_Size      = 100 Meg
+Job_Type    = "cas"
+Experiment     = "dayabay"
Output      = temporary/condor/out/$(filename).out
Error      = temporary/condor/err/$(filename).err
Log      = temporary/condor/log/$(filename).log
# variables
jobsite      = RACF
rootoutputdir   = temporary/output
# ---- end basic.condor
Executable = /afs/rhic.bnl.gov/x8664_sl5/opt/dayabay/offline-opt/NuWa-trunk//dybgaudi/InstallArea/scripts/nuwa.py
Arguments = " -n 1000 -G /afs/rhic.bnl.gov/x8664_sl5/opt/dayabay/offline-opt/NuWa-trunk//dybgaudi/Detector/XmlDetDesc/DDDB/dayabay.xml -o $(rootoutputdir)/$(filename).root -R $(runnumber) -m 'SpadeSvc -S gamma' -m 'MDC09b.runGamma  6.' "
# definitions of variables
MDCtype = MDC09b
ADstate = WET
sourcetype = 
macroname = $(MDCtype).run$(jobtype)
filename = MDC$(jobtype)_$(jobsite)_$(ADstate)_D$(runnumber) 
runnumber = 205 
runtime   = 2008-01-01T02:00:02 

OSG stack

To make use of X.509 components (which one needs to do in order to access Panda), parts of the OSG software stack are needed. These can be accessed by sourcing one of these scripts:

Archive: Meeting notes

OSG/LBNE meeting on 11/23/2009

Present: Jose Caballero, David Jaffe, Maxim Potekhin, Brett Viren

  • Fact sheet:
    • Currently, the software used by LBNE group has dependencies on GEANT4 and ROOT. In about a year, they'll migrate to Gaudi-based framework.
    • Each machine maintained by LBNE at BNL has a total of 5.5TB of disk space spanning 8 drives, for grand total of 55TB.
    • In the current stage of testing, jobs are submitted via Condor, as vanilla jobs

  • Issues/questions:
    • -------------------------------------
    • Q: Do we need glexec to assure transparency of LBNE users identities?
    • A: Not in the short and medium term. David is almost exclusively responsible for running production jobs, the user analysis will be addressed later.
    • -------------------------------------
    • Q: Does LBNE have a central file catalog?
    • A: No, but there should be one.
    • -------------------------------------
    • Q: What shall be used to distribute data to jobs?
    • A: LBNE is considering xrootd, with something else being used in the interim as a stopgap
    • -------------------------------------
    • Q: What do we use in short to medium term, to get going and gain experience with LBNE job execution in the Grid environment?
    • A: We can possibly use DataHost to cache the data, with Pilots providing up/download of requisite files from our existing Datahost.
    • -------------------------------------
  • Action items
    • LBNE Software packaging and distribution, ETA January 2010 (B.Viren). ROOT dependency needs to be clarified, and ROOT built if necessary.
    • Need to find more about Condor policies governing access to the LBNE farm (need to get condor submission file from David, contact J.Hover and M.Ernst if necessary)
    • Hands-on Panda tutorial scheduled on 12/10/2009 (J.Caballero)
    • Follow-up with LBNE VO membership (M.Potekhin)
    • Set up Datahost accounts for Brett and others (M.Potekhin)
    • Set up Panda queues for LBNE (liaison: J.Caballero), start pilot submission.

Kick-off PANDA/PAS/LBNE meeting on 08/06/2009

  • Plans to grow local computing resources to a few dozen boxes
  • Michael Ernst will provide support under the aegis of RACF, on best effort basis
  • Intention is to keep ALL data staged on disk at all times

Major updates:
-- TWikiAdminGroup - 18 Jan 2018

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


ppt PAS_20101112_2.ppt (939.0K) | MaximPotekhin, 11 Nov 2010 - 21:33 | LBNE/PanDA report 20101112
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback