r2 - 28 Jun 2010 - 03:39:19 - RichardMountYou are here: TWiki >  AtlasSoftware Web > Minutes28May2010

Minutes28May2010 RAC Minutes, May 28, 2010

Members (*=present)

*Richard Mount, *Kevin Black (Physics Forum Chair), Jim Cochran (Analysis Support Manager), Alexei Klimentov (ATLAS ADC) Ian Hinchliffe (Physics Advisor), *Rik Yoshida (Tier3 Coordinator) *Michael Ernst (U.S. Facilities Manager) Rob Gardner (Integration Coordinator) *Kaushik De (U.S. Operations Manager) *Armen Vartapetian (U.S. Operations Deputy)

Ex-Officio: *Torre Wenaus Stephane Willocq *Mike Tuts *Howard Gordon

Correction/approval of minutes of previous RAC meeting and core team meetings.

All were approved.

Report on CREM meetings (Richard)

The CREM is working in a constructive and collaborative spirit. Meetings are open and concerns or ideas are welcomed from all. The bad news is that the CREM is increasingly occupied with making detailed decisions about data distribution and deletion and it is clear to all CREM participants much more automation has to be applied. The concerns of the US are substantially echoed by the rest of ATLAS. For example, at the previous day's CREM meeting Eric Lancon had reported a French concern of "how can we survive the next weekend without running out of disk space?"

Group/additional production pilot (Kevin)

The additional high-mass Drell-Yan production jobs were running. Getting to this point had been slower than we would like, but the delays were not due to the production system itself. It had taken time to get agreed job options and overall approvals from the group, followed by some confusion about the detailed steps needed to get the requests into Borut's spreadsheet.

Kevin agreed that a workbook-style documentation of the steps would be the best way to communicate the many details that he had discovered in this production pilot.

Kaushik noted that Borut had significantly speeded the process by instituting a process of creating an EVGEN tarball instead of waiting for a new ATLAS release. Production had suffered 4 or 5 days of technical problems that were not specific to additional production. It was also notable that Borut did not assign the production to US resources only. The US resources had been offered, but seeing spare capacity widely available Borut decided not to restrict jobs to US sites.

US Operational Issues (Kaushik)

"A busy and painful week!" Last week MWT2 disks became full. This week NET2 ran out of space. SLAC and the other T2s were on a "war footing." Kaushik echoed the report from the CREM meeting – we are struggling to survive the weekend. All SpaceTokens are being flooded with ADC-managed data while centrally managed deletions are slow and unreliable.

Deletion appears slower than distribution, and data are being distributed without any knowledge of what will really be accessed.

We have about 4 PB of space at US T2s – unquestionably adequate if we can implement a more usage-based distribution and efficient deletion.

Kaushik proposed "let PANDA decide what data goes to T2s". Distribution would be based on PANDA's data on the number of jobs using a dataset and deletion (at T2's) could be based on popularity to maintain the storage at 80% to 90% full. Kaushik believed that this could be implemented very rapidly. It could be billed as a demo project to provide input to the June 16-18 meeting in Amsterdam. As a demo project it should also be acceptable to ATLAS computing management. Torre noted that the Panda developer concerned still had to be convinced that this was a good long-term solution.

The RAC encouraged this demonstration – in essence committing the US T1+T2 complex to operating in this mode for long enough to understand its benefits and drawbacks.


My notes show that the question "Can we [be officially approved to] delete locally based on log files?" was raised, likely by Armen.

Action Items

  1. 5/7/2010: Richard, Create a web page summarizing the dataset distribution targets in the US (In progress)
  2. 4/9/2010: Kevin, Create first version of a Twiki guiding US physicists on requesting Additional Production. (In progress - a workbook-like step-by-step process will be described)
  3. 3/26/2010: All but especially core team, Find time during the ADC Workshop next week to identify a point-of-contact for Valid US Regional Production (on hold pending a better definition of what this task should be following successful completion of some Exotics Additional Production).

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback