r4 - 31 Mar 2009 - 16:59:57 - KaushikDeYou are here: TWiki >  Admins Web > MinutesDataManageMar31



Minutes of the US ATLAS Data Management meeting, Mar 31, 2009
  • Previous meetings and background : IntegrationProgram
  • Coordinates: Tuesdays, 3:00pm Central
    • (309) 946-5300, Access code: 735188; Dial *6 to mute/un-mute.


  • Meeting attendees: Wei, John, Wensheng, Saul, Pedro, Armen, Hiro
  • Apologies: Alexei, Charles
  • Guests: None

Topics for this week

  • Reprocessing plans - Kaushik
    • Start this week
    • From tape only - everyone needs to keep eye on systems
    • Begin with BNL and SLAC, add other Tier 2's later, if necessary
    • Michael/I will send email when it starts
    • I will arrange short daily meeting to keep track of reprocessing in the US , once we start
  • BNL staging/dcache - Pedro
    • Kaushik - factor of 20 lower throughout because of small files (MC HITS files are 30 MB)
    • Hiro is not sure if this number (20) is right - we need clear monitoring info from HPSS (Pedro will follow-up)
    • Heavy load issues with pnfs
      • Single pnfs server (single point of failure!) serving all requests (pandamover, srm, dq2...)
      • Complained to CERN about high traffic (which crashed server last week) - no clear response yet
      • May not be solved with Chimera
      • Need to pursue with dcache developers - must have redundancy to spread/control load
      • New monitoring is being added to find out where the requests are coming from
  • Tier 2 space cleanup - Bob
    • Big mess on MCDISK, all kinds of consistency problems
    • useful url: http://dq2.aglt2.org/ust2_st_dsn.html
    • Shawn developing consistency tools
    • Hiro - big mess, strange files, cleaning M4, M5, v12 and user data (transfer to user area) to start with
    • Need to learn from AGLT2 experience for other Tier 2's
  • DQ2 adler32 plugin - Hiro
    • Running fine in passive mode, no corrupted files found yet
  • Hot issues
    • File corruption at NET2 - Saul
      • Wensheng, Hiro, Shawn helped - found a hardware problem with network card
      • problem badly hurt large files, 50% rate for files above 2GB
      • Fixed hardware problem - no new problems
      • Scanning all files to identify and fix corrupted files on MCDISK, DATADISK
      • New version of Bestman running
      • Local user having dq2 problem (dq2-get of pathena output) - filed Savannah bug, no response, will try listserve
      • Could be ACL problem in LFC - Hiro will help
  • AOB
    • Next week, meeting will start at noon CDT

-- KaushikDe - 31 Mar 2009

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


pdf 20090324_31_HPSS_General_Read_Performance.pdf (67.8K) | Main.psalgado, 31 Mar 2009 - 11:51 | HPSS Read Performance (24-31 Mar09)
pdf 20090317-24_HPSS_General_Read_Performance.pdf (56.2K) | Main.psalgado, 31 Mar 2009 - 16:18 | HPSS Read Performance (17-24 Mar09)
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback