r11 - 10 Mar 2009 - 17:02:46 - RobertGardnerYou are here: TWiki >  Admins Web > FacilityWGAPMinutesMar10



Meeting of the Facilities working group on analysis queue performance, March 10, 2009



  • Meeting attendees: Rik, Nurcan, Rob, Horst, Patrick, Jim, Akira

Recapping the working group program (Rob)

An overarching goal is to assess the facility readiness for analysis workloads

  • Measurement of analysis queue performance
    • In terms of scale - response to large numbers of I/O intensive jobs
    • In terms of stability and reliability of supporting services (gatekeepers, doors, etc)
    • In terms of physics-throughput efficiency


  • March 15: Specify and test a well-defined set of job archetypes representing likely user analysis workflows hitting the analysis queues.
  • March 31: Specify set of queue metrics and test measurements made
  • April 15: Facility-wide set of metrics taken

ANALY Queue testing reports (Nurcan)

I propose the following jobs to be put in the suite of test jobs.
Job type Athena release Input dataset Notes
SusyValidation 14.5.0 mc08 AOD we will start with this as daily tests
D3PD making with TopPhysTools 14.5.0 Top-mixing samples AOD will be used at BNL Jamboree in March
TAG selection 14.5.0 FDR2-c TAG TAG's has not been made yet for reprocessed dataset (see below)
AthenaRootAccess 14.5.0 mc08 AOD can also read DPD made from AODtoDPD making
Data reprocessing data08_cos*, data08_1beam* DPD also uses dbReleases option

  • Would like to run jobs on queues continuously. Every day we'll submit 1000 SUSY validation jobs. Use scripts developed previously.
  • n-tuple making (tertiary DPD) for Top
  • TAG files not yet available for reprocessed data
  • Reprocessing jobs required dBRelease for conditions data
  • Monitoring - there is a development version in the ARDA dashboard - view of all analysis tools (Benjamin working on this). Will use Hammer Cloud metrics. Job timings will be uploaded to a database. Queue time, input staging time, wall time.
  • Question about how to more
  • Thinking of 100M events
  • Need a table of ANALY queue features summary

Analysis job submissions - stress testing plans (Rik)

  • ASC has list of names that can submit jobs - in a chaotic fashion. Can be used to define the readiness of the site.
  • Use these as a run-up to the stress tests in May
  • Two weeks in late May
  • Start w/ top mixing sample - mimics data, but want a larger scale.
  • Generate and simulate prior to the stress test. Rough estimate is up to 100 TB.
  • Stress test to use QCD backgrounds.
  • See summary below.

dCache performance measurements for analy jobs (Charles)

  • We had a problem with large number of analysis jobs stressing dcache - turned out to be trivial network misconfiguration.
  • Taking next step to look at scale.

TAG selection job problems (Nurcan, Marco)

  • last meeting
    • Report from Nurcan on jobs using, panda-client-0.1.8 and inDS=fdr08_run2.0052283.physics_Egamma.merge.TAG.o3_f8_m10:
    • TAG selection jobs work at AGLT2, SLAC, BNL, do not work at UTA, OU, NET2, MWT2. The jobs at the latter sites still have a status of finished, however PFC.xml is empty so no associated AOD's are inserted, thus TAG selection was actually not run. These jobs should have a status of failed.
    • Paul and Tadashi discussing this. Should Athena fail this, rather than the pilot?
    • See Marco's report below - there may be an option in the job options to throw a failure if a file is missing. Marco will follow-up regarding NET2.
    • UTA - is the pilot back-navigating into xrootd (via root-URL, like SLAC)?
    • Error at UTA, OU, NET2 (PandaID's 26835256, 26835254, 26835252): ImportError: /atlasgrid/osg-wn-1.0.0/lcg/lib64/python/_lfc.so: cannot open shared object file: No such file or directory. Comment from Tadashi: The above message means that 32bit python in Athena failed to import lfc.py due to lib64/python/_lfc.so and then native 64bit python was used. The plugin was tried with the following LD_LIBRARY_PATH but failed. I have the impression that 64bit osg-wn's runtime was screwed up due to something in 32bit Athena. Tadashi reported that this works at CERN and TRIUMF etc which use 64bit OS. What needs to be done from Tadashi: Check to see if the shared file exists. And then check to see if lfc.py can be imported in 64bit wc-client runtime and 32bit Athena runtime.
    • Error at MWT2: no error is found as above, it only prints: RuntimeWarning: Python C API version mismatch for module _lfc: This Python has API version 1013, module_lfc has version 1012. PFC.xml is empty. PandaID=26835630.
    • Probably AOD dataset missing at the site - FDR files. Charles will re-subscribe.
    • Error at ANALY_BNL_test: send2nsd: NS002 - send error : client_establish_context: Could not find or use a credential ERROR : LFC access failure - Bad credentials. The voms proxy (valid for one day) has expired before the pilot (new pilots sent by Paul) picked up the job.
    • Not sure what is causing this - whether its a transient or not. Works at ANALY_BNL_ATLAS_1.
    • Note: at ANLAY_BNL_ATLAS_1 the ESD files are also inserted into PFC.xml together with AOD's, any idea why?
      • Seems like a transformation "bug", but only happens at one site? Marco believes it only happens if the ESDs are at the site. TAG group is working with Tadashi on limiting the list of files included.
  • this meeting
    • Patrick: OSG workernode client setup of LD_LIBRARY path, and what CMT does it. XPATH library is removed by CMT - this lib. Can be repaired in a couple of ways. On-going discussions between Marco, Tadashi and Patrick.

BNL queue scheduling issue (Nurcan, Xin)

  • last time
    • https://rt-racf.bnl.gov/rt/Ticket/Display.html?id=11880
    • The long and short queue has 420 slots allocated for each. We are not occupying all slots. It is a problem with Condor. Pilots are stuck waiting for a slot for a long period of time (40' currently) as last reported on 2/16.
    • Proxy is needed, so had to switch to pilot submission
    • Xin believes there too many stage-out jobs going simultaneously; moved condor submit host to a more powerful machine on Feb 16 - have not seen the problem.
    • Looks like this morning things are fine, but we need to check regularly.
    • API is available to get the number of running jobs.
  • this time
    • Xin updated this morning. - New hardware for submit host. Status unknown.


  • Next meeting is March 31, 2009

Addendum to stress test discussion (Akira)

Hi all

Below, I summarized what we discussed about the stress test. More input welcome. We are still looking for contributors for preparing this exercise, if you know anyone, please let me know.

Cheers Akira

==================================================== Ideas for stress test: This exercise will stress test the analysis queues in the T2 sites with analysis jobs as realistic as possible both in volume and quality. We would like to make sure that the T2 sites are ready to accept real data and analysis queues to analyze them. The stress test will be organized sometime near the end of May.

Basic outline of the exercise: To make the exercise more useful and interesting we will generate and simulate (Atlfast-II) a large amount of mixed sample at T2. We are currently trying to define the job for this and we expect this to be finalized after the BNL jamboree next week. The mixed sample is a blind mix of all SM processes, which we call "data" in this exercise. For the one day stress test, we will invite people with existing analysis to try and analyze the data using T2 resources only. It was suggested to compile a list of people who have the ability to participate.

Estimates of data volume: A very rough estimate of the data volume is 100M-1B events. Assuming 100kb/event (realistic considering no truth info and no trigger info), this sets an upper limit of 100TB in total. It was mentioned that this is probably an upper-limit from the current availability of USER/GROUP disk on T2 (which is in addition to MC/DATA/PROD and CALIB disk) but this need to be checked.

Estimate of computing capability: Right now there are "plenty" of machines assigned to analysis though the current load of analysis queue is rather low. The computing nodes are usually shared between production and analysis and typically configured with upper limit and priority. For example MWT2 has 1200 cores and setup to run analysis jobs with priority with an upper limit of 400 cores. If production jobs are not coming in, the number of running analysis jobs can exceed this limit.

Site configuration: Site configuration varies among the T2 sites. For this exercise, it is useful to identify which configuration is most efficient in processing analysis jobs. It was suggested that a table be compiled showing basic settings of the analysis queues for each analysis queue.

Pre-stress-test test: To make the most of the exercise and not to stumble upon trivial issues during the stress test, pre-stress test exercise was suggested. It was requested that before launching a large number of jobs, the site responsible people are notified.

To do: Data generation/simulation job to be defined by Akira List of possible participants to be compiled by Rik A table of site configuration to be produced by Rob Someone to define pre-stress-test test routine ============

-- RobertGardner - 05 Mar 2009

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback