Running Condor 6.9.4 job scheduler. Tier3 used when idle, vice-versa.
Analysis queues are setup
Monitoring - APC for power, temp, humility
Using Cacti, syslog-ng, Nagios and Ganglia
Problems
some gatekeeper crashes under heavy load; problem solved - related to XFS filesystem exporting NFS, # inodes
NFS server crashes - fewer of these
Accounting issues have been fixed
Large memory jobs
MWT2
See slides
NET2
Tour of Egg - a generalized management framework that works well for Tier2 centers
Quickly add up spec-int's for all cores in the system
Examine PBS queues, gridmap file, etc.
Harvard starting up: 17 TB thumper; 10 servers
Tufts interested in Tier3
SWT2-OU
Added 23 dual quads 1950s; online and in production
Shared resources - sizeable Tier3 facility
Expand to use campus Condor pool - 750 cores
OSCER running, some probs w/ dg2_get, _put.
SWT2-UTA
New machine coming online - 200 cores
Negotiating for 400 cores of Opteron 2220, 210/180 TB MD1000 + 7 PE2970 to finish FY07
Networking is biggest concern. SWT2 is off-campus, 1 GB/s
LEARN peering w/ I2 at 1G
Need to improve 50 MB/s to 100 MB/s
NLR and I2 boards are not working together; Peering in Houston the problem
Would prefer not to support interactive users
W2
See slides
Negotiating thumpers w/ TB-sized disks
Will try to run Bestman SRM on XrootdFS - we need to follow-up w/ srm experts about client access
Issue - how to implement analysis queues in fair share environment
Need SRM to do load balancing
New GUMS v1.2 - one-to-one mapping
Need to upgrade DQ2 to 0.4.1
Network tuning - up to 800 Mbps in both directions - but not stable (competing traffic)
Will upgrade to 10 G link - January
Plan to evaluate Terapaths and QoS
Performance - utilization is less than 200 on average - related to lack of input datasets.
Good news is less useless debugging after moving to PandaMover
Facility Planning (Michael)
Scope - next 6 months
Analysis at Tier2 centers - high priority
December 15 Site configuration by admins
AOD replication, Q's: how much space is needed, need to decide which datasets. Must be complete by December 31. Who makes decisions on datasets: physics coordinators and usage patterns. Kaushi will consult Alexei. Jim will talk w/ physics coordinators, report back to Kaushik, Alexei, Michael, Facility.
Interactive analysis
BNL PROOF farm - for tests, completed by Jan 31 Ofer Rind
BNL PROOF farm into production, multi-user mode: March 31
Tier2 PROOF farms available?
Action item - plan for setting this up - as part of interation program. June 30. Plan to be delivered: end of January. Bruce, Ofer, Patrick, Sergey, Rob
Support setting up of Tier3's
Immediately, on-going. Doug/Duke, UTD/Justin, ... Need to contact Tier3's.
Evaluate pinning SRM v2.2
How important is space reservation? Gabriele: totally linked.
Must do this on a short timescale - Gabriele: plan by December 31
Develop and deploy software necessary to manage pinned files. To be integrated into DQ2
Disk space reconfiguration according to the computing model
Kaushik - we need disk-only areas. Proactively have our own plan.
Development and deployment of disk-only management tools: what are the needs?
Available space and usage. Kaushik will provide a bulleted list of requirements appropriate for Panda
LFC
Test system deployed by 31 December John, production ready by 31 January.
Migration by end February.
US ATLAS data management
Storage quota system US ATLAS wide - to be handled within DQ2 - to bring up w/ Massimo
Data deletion system - Need to collect capabilties, report to DDM operations Alexei
Complete DQ2 lost file tagging Kaushik will bring this to Operations
Jim Develop policy for Tier3 data, to be discussed at RAC
Jim Need a model for lifetime management of AOD, ESD, DPD's at site.
Incident tracking and communication, Elog deployed and operational, Mark complete by December 15
Performance
Average 90% efficiency of 2007 WLCG pledge; important for funding agencies to review
Many other issues not covered here.
Next meeting
US ATLAS Tier2/Tier3, jointly w/ OSG all-hands at RENCI / North Carolina, March 3-5, March 2008
Propose US ATLAS talks in the plenary session
Format - TBD
US ATLAS Tier2/Tier3, last week of May 2008 - location: Ann Arbor
Please note that this site is a content mirror of the BNL USATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your BNL USATLAS account.