r3 - 12 May 2009 - 12:33:27 - NurcanOzturkYou are here: TWiki >  Admins Web > FacilityWGAPMinutesMay12

FacilityWGAPMinutesMay12

Introduction

Meeting of the Facilities working group on analysis queue performance, May 12, 2009

Links

Attending

  • Meeting attendees: Patrick, Saul, Rob, Paul, Horst, Rik, Nurcan, Jim
  • Apologies: none

Forum for US sites

Revised STEP09 schedule & plan

ATLAS-wide STEP09 coordination

Analysis Queue testing reports (Nurcan)

  • AnalysisQueueJobTests
  • last meeting
    • See site certification table for the status of stress testing of queues with different job types, AnalysisSiteCertification
    • Will still need to run SUSYValidation job at AGLT2 (91% success rate from last run, failures with "error code 256" were being investigated)
    • Stress testing with DPD making job is almost done, used a container dataset (mc08.105200.T1_McAtNlo_Jimmy.recon.AOD.e357_s462_r579/). MWT2 needs to be tested (dcache related problems last time, site will run a test), AGLT2 jobs were caught by failures with "error code 256" in the second run (on the container dataset).
    • Rik defined a TAG selection job, will use it in my testing
    • Data reprocessing job was to define by Mark Slater from Ganga/HammerCloud team , will check the status.
    • HammerCloud is now running in the US cloud, we need to discuss how often we like to run it and at what scale. I asked Mark Slater to add SUSYValidation and DPD making jobs into HammerCloud? , currrenly running a simple muon analysis (calculating invariant mass of dimuons).

  • this week
    • TAG selection jobs are now running at NET2 and SWT2. Submitted manual tests again. NET2 ~100% success. Stage-in problems at SWT2, xrootd system had an issue. Will submit new jobs today.
    • SUSYValidation, D3PD? making and TAG selection jobs are now integrated into Hammer cloud, see https://twiki.cern.ch/twiki/bin/view/Atlas/StressTestJobs
    • ARA - exclude from Hammer cloud test
    • Reprocessing job - to run on reprocessed DPD datasets. Currently defined jobs are producing large output files - not suitable for stress testing. Also uses DB access.
    • Finding a job that requires DB access is now high priority to be put into Hammer cloud. Tried to run D3PD? making job on a Atlfast AOD's, job failed, unable to get a connection to COOL conditions database. Sent a message to Sasha. If this job is successful it can test this database access.
    • Will remote DB access be required.
    • ntuple making from AOD - popular use-case

Additional ANALY queue jobs (Rik)

  • last time
    • TAG selection job defined.
    • BNL, SLAC, NET2 - no problems
    • Two job errors at MWT2. Poolfilecatalog.xml failures. Follow-up with Marco.
    • Will start making it systematic, clean up and make as a regular test job.
    • Akria communicated w/ Kaushik to subscribe ESDs to Tier 2s - not sure if they're available yet.

  • this meeting
    • Top MC ESD - Rik will find users to submit such jobs at Tier 2s

Site readiness (Rob)

  • AnalysisSiteCertification
  • Dataset placement
  • What other site-level preparations in advance of June testing?
  • dq2-get fetching metrics - organize pre-stress testing activity?
    • Volumes: 1 GB / 10 GB / 100 GB / 1 TB
    • Numbers: 10 / 100 / 1K / 10K
    • Options

AOB

  • Jim will discuss with Torre a panda-mover scheme
  • Discussed small file problem - large number
  • Understanding hammer cloud errors
  • Next meeting: May 26.


-- RobertGardner - 11 May 2009

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments

 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback