r4 - 08 Jun 2009 - 13:52:16 - RobertGardnerYou are here: TWiki >  Admins Web > FacilityWGAPMinutesJun2

FacilityWGAPMinutesJun2

Introduction

Meeting of the Facilities working group on analysis queue performance, June 2, 2009

Links

Attending

  • Meeting attendees: Nurcan, Patrick, Shawn
  • Apologies: Rob

Status of stress test jobs in US

  • Issue for failing DB access job at SLAC and SWT2 is now understood. Input files were being copied correctly however were failing in reading. Tadashi found the problem, PyUtils? /AthFile.py failed to check the file, this supports os and rfio only, but needs to support root:// (and dcap://). Sebastien Binet provided a new tag, PyUtils? -00-06-17. Nurcan tried to check out Tools/PyUtils-00-06-17 package locally from SVN so that it will be compiled on WNs with pathena. However currently this package does not compile locally. Investigating.

STEP09 - Hammer cloud issues

  • First STEP09 analysis stress test jobs have been started on 6/2.
  • http://gangarobot.cern.ch/st/
  • Ours: 434-437. Start: 6/2, End: 6/9
  • US cloud is doing fine. Jobs are mostly in submitted state as we have other user jobs in the system already, Hammer cloud jobs have lower priority. Expect they will finish within a week.
  • SLAC: 100% failure on test-436, to be investigated.
  • SWT2: 6% failure on test-434 and 7% failure on test-437. Patrick reported a problem with direct reading, he is investigating.
  • A note from Bob in preparion for STEP09 jobs:
    We have switched our copytoolin to dccp from none at AGLT2.  
    Local tests reveal the strange behavior previously noted for lcg-cp OUT of dCache to local disk, 
    while lcg-cp IN to dCache seems to proceed just fine, with peaks of over 30MB/s.  dccp copying 
    out of dCache to local disk proceeded at over 30MB/s in the tests. Let us hope that our HC tests 
    now have a greater success rate.

STEP09 - Real user jobs

  • US is planning an user analysis challenge on June 12th. J. Cochran will announce. Expect about 20 users to submit their jobs first on a 10M event sample, step09.00000010.jetStream_pretest.recon.AOD.a84/ (available at Tier2's), then on a 100M event sample which will be ready by June 12th.
  • No planned activity in UK and DE clouds as reported by Graeme and Johannes. They asked their users to submit during STEP09 and report on experience, latency in running, failure rate , etc.

Data copy Tier 2-Tier 3

  • Rik reported:
    We've started a test of dq2_get performance in transferring data from T2
    and T1 to T3. Eventually the plan is to do a stress test of multiple
    dq2_get's.  However, the first tests were made by transferring ~120 GB
    of data to ANL ASC with single dq2_get command
    from:
     MWT2, NET2, BNL, SLAC and SWT2 (AGLT2 had an incomplete data set of the chosen data set)
     The test indicate MWT2-ANL copy can be made at ~380 Mbps enabling a copy of
    several TB in one day.  The speed from other T2's and T1 vary from 80 Mbps at
    best (NET2) to 30 Mbps (SLAC) at worst.
     Some copying errors were at the level of 1.5% (11/790 files had 0 lengths).
    
    The details of the tests can be found at:
    http://atlaswww.hep.anl.gov/twiki/bin/view/ASC/Dq2_getStressTest


-- RobertGardner - 01 Jun 2009

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments

 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback