Minutes of the Tier2/Tier3 workshop at SLAC, November 28-30, 2007

LHC and ATLAS status (Jim Shank)

  • See slides - overview of LHC machine and ATLAS status
  • Beam by July 2008, but a large number of tests for sectors (interconnect of cryo, leaks, power-up, ..) right up to the turn-on date. No room for slippage.
  • Leaks and shorts are the big problem - warm up/cool-down takes 2 months to recover
  • Probs resolved:
    • Triplet problem - welding and cold masses
    • problems in the plug-in modules (in the bellows that interconnect mags). Fingers get mangled.
  • 4 sectors will be cooled by end of year - so on schedule for July
  • ATLAS - schedule is also very tight
  • Small wheel, other services on the critical path
  • Endcap toroid problem. Tests last saturday - slid during power-up and hit parts of LAr calib system. Assessing damage now. If minimal, can do this w/o warm up of LAr.
  • ATLAS computing
  • SRM testing v2.2, throughput testing
  • FDR pre in December (bytestream production - output to SFO at CERN); FDR-2 will be Jan-Mar 2008, a big simulation production.
  • CCRC - combined computing readiness challenge - a WLCG exercise.
  • M6 coming up in April. There may be an M7, depending on schedule.
  • Combine FDR and CCRC? Discussions on-going.
  • See FDR definition - Dave Charlton.
  • Putting data at the SFO's - emulating real collection of data - raw data for physics streams, calibration streams, expressline
  • Possible there will be multiple reconstruction runs on the express streams.
  • Want to get ES data to BNL/Tier1
  • Data processing - as function of Tiers
  • AOD copied to Tier1's and Tier2s immediately, and DPD as requested.
  • Exercise the concept of a DPD train - small group of physics group coordinators placing algorithms passing over the datasets.
  • There are questions about where the DPD production chain runs - we may run it throughout the Tiers.
  • Discussion of the new computing organization within ATLAS, ADC (ATLAS Distributed Computing). Two areas - development and operations. First meeting this morning. Will begin pushing this very hard in January.

Facilities Status and Plans (Michael Ernst)

  • See slides giving overview of the computing organization in US ATLAS
  • Service model - need a U.S. ATLAS operations coordinator, working from Brookhaven
  • US contribution to worldwide production is substantial - level of 33% overall
  • Data location among tiers - a task group is working on the model for Tier3's, work underway
  • Resource allocation committee - chaired by Jim - user community can submit requests
  • Tier1 resources - additional funds coming from MR for LAN backbone upgrades and high performance disk, etc. Other resources for operations and integration
  • Data storage a main challenge - upgrade to dcache 1.8 providing srm 2.2 (allows pinning) and eval of Chimera to replace the vulnerable pnfs.
  • See list of critical issues - too many files of US data on tape, long latencies for pathena users, need for more disk
  • Why tapes don't work well - esp for analysis
  • Transitioning to operations - stability is the most important thing; manpower intensive.
  • New operational model (service based, not systems based) for the RACF. Implementing new SLAs, service coordinators oversee response and resolution of error conditions. Work in progress, see dependency matrix.
  • Challenges ahead: infrastructure to be built; more efficient use of the facility resources; integration of operations of the Tier1 with the whole facility.
  • Securing the facilities readiness - getting guidance from ATLAS milestones.
  • Questions
    • Jim: dcache - srm 2.2 and chimera - any hands-on experience? Chimera - very clear that this will be a major improvement over pnfs which uses file-locking. srm v2: the pinning, completely missing in current version, and has to be integrated in the DDM level to track status. Current system pushes data out w/o control. Another problem resolved had to do with performance degradation w/ concurrent requests. These have been claimed to be resolved, and should handle thousands of requests from DDM. Note - what would be a viable alternative at the level of a Tier1?
    • Kaushik: Need layering of software and services, ATLAS-specific, on top of dcache.

Facilities Integration Program (Rob Gardner)

  • See slides
  • Question about Tier3 center support and definition. There is a need to define this better.

Production and Analysis (Kaushik De)

  • Process and functional requirements at US ATLAS Tier2 and Tier3 centers including Panda Mover, Pathena, Autopilot, and DQ2
  • Going over high level issues:
    • MC production and processing simultaneously
    • How much to scale up?
    • How to integrate Tier3?
  • Major challenge will be the work associated with Panda to other clouds
  • Another challenge is providing pinning functionality. srm v2.2 will be rolled out in mid-December at BNL. It also has to be incorporated into DQ2.
  • DQ2 critical issues
    • hierarchical datasets, lost file flag, tape handling
  • LFC - testing and evaluating for performance
  • DA usage rising rapidly, limited because of data availability; note there are 30K user jobs waiting to run.
  • Action item replicate AODs to Tier2's. How many files/TB? Need a point-person for this in the US Cloud.
  • Questions
    • Patrick: What about user datasets? At the moment they are left where they are produced. How long should they be kept precious at a Tier2? Expiration and quota system? This is a pressing issue.

Shift operations, troubleshooting, discussion (Mark Sosebee)

  • A discussion moderated by Mark Sosebee, including an introduction to the ELOG shift logging system
  • See eLog - http://atlas003.uta.edu:8080/
  • Goal will be to catalog shift info from last few years.
  • Can we post system alerts, for down central services. Yes.

Fabric Services: Storage management - Xrootd (Ofer Rind)

  • See slides
  • Very appropriate for smaller sites like Tier3's. Integration w/ PROOF most interesting. An apache/tomcat server can be used for remote proof sessions.
  • Operation and management: two people working the test installation at BNL - have set up to separate classic sys admin tasks - Ofer, and file system (non-priv and some sudo privs for some operations) tasks - Sergey (usng xrd admin account, and Tentacle cluster management tools).
  • Lots of monitoring, both native and Ganglia/Nagios integrations
  • XrdMon installation. Would like to see mature product available for this. Need a contribution!
  • Managing utilization, workloads. Eg. analysis trains. Need tools for data management tasks, all within the ATLAS framework.
  • Questions
    • ratio of proof-cores and xrootd server nodes. proof tries to use data from local sources. A big area of investigation.
    • What about data import, export issues and xrootd? - See Andy's talk below, regarding work on srm-xrootd.
    • Interest in FUSE - file system visibility

Proof and Xrootd for Tier3 centers (Bruce Mellado)

  • Packetizer - basically job schedulers that defines how jobs are run on nodes. This is where work is to be done, to optimize for Tier3-size facilities.
  • xrootd-proof tests at GLOW-ATLAS
  • Look at response of PROOF to varying file sizes and formats.
  • Result is I/O limitation to 50 MB/s due to RAID5 cards. Consideration for going to 16 core systems.
  • Performance tests - learned that shouldn't start more than 2 workers on a node. Importance of pre-load... to be understood.
  • Issues at GLOW - avoid network traffic.
  • Looking at Condor job scheduling over a Tier3 site, integrating PROOF and Condor COD. Issue is supporting multiple users.
  • I/O queue tests - using ESD files, comparing locally resident and data from other nodes served by xrdcp. Big difference comes from attempting NFS.
  • Looking at xrootd file distribution, and reduction in elapsed time.

Next generation of xrootd (Andrew Hanushevsky)

  • Next generation xrootd/olbd and its performance, Posix filesystem interface for xrootd using FUSE
  • Scalla implements a distributed namespace.
  • cnsd - composite namespace server daemon. Client sees a composite namespace for each server, hosted by a common xrootd. Namespace is replicated in the filesystem. No external database needed, small footprint.
  • FUSE - implement a file system in a user space program.
  • Application client has Posix access via kernel to namespace server, etc.
  • Globalization: redirectors can affiliate with a meta manager simply
  • ALICE has found very good performance for WAN access using hints which effectively pre-read root-tree data. Comparable to LAN performance.
  • Questions:
    • Namespace management in the meta-cluster configuration. Would have to setup a meta-manager.
    • Issue about client mounting the namespace and how data flows between client and server - how does srm-like load-balancing work.
    • FUSE - no quotas possible. Authentication is done Unix-like (eg. NFS).
    • more questions/good discussions..

Fabric Services: Analysis Queues (Mark Sosebee, Bob Ball)

  • (Condor, PBS) functionality and configuration examples for setting up analysis queues
  • Questions / issues to consider. Dedicated versus opportunistic CPUs. Long versus short queues.
  • So far, pathena jobs have run only at BNL - primarily due to location of AODs
  • Notes new mode: pathena --site xxx.
  • AGLT2, OU sites are setup (Condor sites)
  • On-going testing at UTA using PBS scheduler
  • Need to determine schedule
  • Easy to setup for Condor sites - Horst did this in an hour.
  • An issue is setting up analysis queues for Tier3 - can they fetch AODs or ESDs already at an affiliated Tier2.

-- RobertGardner - 28 Nov 2007

