WBS 1.1ATLAS releases, deployment method, tests: we continue to utilize the framework of Xin Zhao for deployment of ATLAS releases on OSG sites. Initial plans in place to work with Panda team to integration and test a Panda-based release installation jobs. This work has been deferred by the Panda development team.
WBS 1.2DQ2 site services: work included definition and plans for a DQ2 integration testbed at BNL and UTA. The work was delayed due to DQ2 0.4 slippage.
WBS 1.3OSG services: deployment of the OSG Integration Testbed software stack ITB 0.7 on three US ATLAS sites (BNL, UC, OU) including the provisioning of an OSG storage element at BNL. BNL and UC are providing operational assistance to OSG VO validating against the OSG software stack. ATLAS validation on ITB 0.7 (for the deployment release OSG 0.8) included full Panda tests running more than 20 complete production jobs over three days on the UC_ITB site.
WBS 1.4Storage services: work during this period included wide area gridftp memory-to-disk transfer load tests for globus-gridftp/NFS storage elements and dCache-gridftp storage elements. Local disk throughput optimization (including dCache optimization at Tier2 sites) is work still to be performed.
WBS 1.5Monitoring services: work on Nagios-based alarm infrastructure for the Facility continues including initial integration work with OSG "RSV" probes, necessary for WLCG site availability monitoring.
WBS 1.6Logging services: Facilty-wide syslog-ng forwarding of DQ2 site services logfiles and development of the troubleshooting console continues to be operated, though no effort was identified to implement a security layer for the infrastructure; the work has been deferred until the next phase.
WBS 1.7Load tests: a control framework based on Monalisa has been implemented which provides regular, scheduled tests of data transfer operations of various types. Closely related to the load testing effort, we have launched in this phase an initiative to optimize various modes of throughput between BNL and the Tier2s, beginning first with a systematic program of network optimization. This program is being led by Shawn McKee. At the time of this report three Tier2 sites (AGLT2, MWT2_IU, MWT2_UC) have been optimized at the level of Gigabit capacity (>950 Mbps ceilings) and Gridftp throughput (>112 MB/s).
WBS 1.8File Catalogs: an initial survey of options for a possible replacement for the local replica catalogs used by the sites has been made; a technology decision needs to be made by the Panda development team.
WBS 1.9Accounting: the accounting infrastructure comprised of OSG provided components (Gratia, and a forwarding service to the EGEE APEL/web portal services) has been checked on a site-by-site basis. Reporting irregularities (caused by site-level VO-mapping problems, OSG/EGEE registration problems, etc) have been discovered and steps to eliminate them are presently being pursued.
WBS 1.10 Site certification Table: the organizational tool to track progress on tasks at the site-level for each WBS area. Included in this tasks is a program to setup analysis queues at each site (this effort is led by Bob Ball from Michigan and Mark Sosebe from UTA) with configurations for both PBS and Condor job managers. We now have analysis queues deployed at two sites AGLT2 and UTA.
WBS 1.11 Summary Report: this report.
Procurement reports and capacity status
Procurements from Phase 1 were reported in SummaryReportP1.
Procurements during Phase 2 (Aug 15-Sep 30):
DDM issues continue to pose the most significant challenge to operational stability. Instabilities with DQ2 0.3 persisted at all sites, requiring frequent manual restarts.
Delay in release of DQ2 0.4 prevented work in establishing a DDM testbed, and addressing the stability issues. At the end of this Phase BNL and UTA had completed the first installations of DQ2 0.4.
Initial load testing and network performance measurements indicate much work is to be done to tune hosts at the Tier2 centers. Most sites are well-below (up to factors of 5) their theoretical ceiling given the rated network capacities between their site and BNL.
Carryover issues to next Phase
Functional and scalability tests of DQ2 0.4 in the testbed, collection of known problems, feedback to Miguel on documentation.
Full roll-out of DQ2 0.4 to all production services.
Continued development of load testing framework in terms of displays and test definitions. Weekly feedback during Wednesday meetings on load test performance metrics.
Complete network optimization at all Tier2 sites.
Begin focus on storage throughput optimzation, beginning with dCache sites at Tier2 centers.
Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.