US ATLAS Facility Plan Requirements Discussion


These are notes taken from the facilities planning discussion at Tier3/Tier2 Workshop at Indiana, http://indico.cern.ch/conferenceDisplay.py?confId=15523.

General production goals

  • Kaushik notes that we need to triple our production capacity by December 2007. This might be achieved by doubling installed capcity of cores, since we are already producing 1/3 more than the US share of the overall ATLAS quota.
  • We also need to maximize our efficiency to meet this goal, within the context of the Integration Program. Our target in 2007 is ~3000 cores (see Michael's table, assuming a 1.3 factor between core and Si2K? ).
  • Kaushik notes that about 1200 jobs on average are running. For storage, 1PB capacity at the Tier2s by end of 2007.
  • Beyond capacity, performance issues, e.g., I/O and disk access, is a concern. This may depend on the access methods, e.g., TAG based analysis into files for data skimming.

Scaling problems

  • Scaling issues arising from increase in the CPUs over the past few months have mostly been involved with data transfers. No Panda scaling issues have been observed.
  • Large numbers of files backlogged at the Tier2 centers. Will DQ2 0.3 solve these problems?
  • We need to understand better which pieces of the infrastructure are introducing latencies; for example the SRM, especially for small files.
  • Regarding storage at BNL, within a month there will be about 1.5 PB of storage available.

Storage management at Tier2s

  • For scalability, we need a managed entry point into the backend storage systems at the Tier2s.
  • We thus need a SRM at each Tier2 with load balancing between Gridftp servers behind these. We will not be able to live with single Gridftp door at the site.

DDM issues

  • Alexei reporting on a situation of high latency in a single transfer being investigated; discovered that it was a dcache problem. Wei: tests with a 1000 file subscription to SLAC.
  • Data distribution issues: schedule and timeline? Alexei would like to do the functional tests as soon as possible.
  • By next Wednesday all sites should be upgraded to DQ2 0.3 so that functional tests and AOD replication may begin.

Analysis requirements and decisions

  • Complete copies of the AODs to be replicated at each Tier2.
  • Separate analysis queues need to be setup at the site. Kaushik will provide the the recipe in the integration page. (See AnalysisQueueP1)
  • Produced DPD's to be replicated back to BNL and CERN.

-- RobertGardner - 22 Jun 2007

