ATLAS PPDG quarterly report Q3'01 (Jul-Sep) =========================================== Distributed data management --------------------------- Development of the Magda (formerly DBYA) distributed data management system continued. Magda is being developed to fulfil the principal ATLAS PPDG deliverable for year 1, a production distributed data system deployed to users. Several enhancements were made to the file and replica cataloging in Magda. A Globus replica catalog loader was developed to migrate the Magda replica catalog content to Globus and evaluate, but it remains to be tested. Scalability tests of Magda cataloging were done; catalog size was increased from the current stable count of 160k up to ~400k and then up to 1.5M cataloged files. After minor bugs were fixed the system performed well at 1.5M files with a lookup performed on the entire catalog taking ~30sec. Input was given to the replication requirements document based on Magda and earlier experience. Support for several types of file collections was added. Support for file replication between distributed sites was added to Magda. The Globus gsiftp tool is used for replication among US ATLAS grid testbed sites, while scp is used at the moment between CERN and BNL. A multi-stage automated process moves a file collection (in the most complex case) out of a source-side mass store into a cache, over the network into a destination cache, and into a destination mass store. The system has so far been used to replicate ~100GB of ATLAS simulation data between CERN and BNL, and small volumes have been replicated to other sites. Cataloging and replication were extended to support the Castor mass storage system at CERN. An 'SQL accelerator' was developed and integrated into Magda to expedite processing of MySQL commmands from remote client sites. SQL commands are accumulated on the client side and dispatched in bulk to the database as an SQL text blob, which is processed on the server side by a script triggered (via HTTP) by the client. This eliminates per-command network latencies and speeds up bulk catalog operations over WANs by orders of magnitude. With the accelerator, cataloging 1.5M files over a WAN was shown to be practical. Deployment of Magda was extended beyond BNL and CERN to ANL and LBNL, and partially to Boston University. Development plans for Magda were coordinated with PPDG, GriPhyN and the CS projects at GriPhyN and PPDG collaboration meetings in August. Jennifer Schopf now acts as liaison with the CS projects. The description and documentation of the system was improved. Further information (the documentation page) and a talk is available at http://atlassw1.phy.bnl.gov/magda/info The system itself is at http://atlassw1.phy.bnl.gov/magda/dyShowMain.pl Near term plans include completion of command-line tools providing a file access interface to production jobs; tools to monitor throughput and gather statistics in a production environment; ATLAS framework (Athena) integration; further integration of Globus tools (remote command execution, replica catalog); exploration of other data movers (GDMP, bbcp); and application and testing in ATLAS Data Challenges commencing in December. Discussions on the application of Magda within the ATLAS Data Challenges began during the period. Development of a DC production scenario for simulation data using Magda also began during the period. Prototype scenarios for grid-enabled data access from Athena, the ATLAS experiment's control framework, were investigated. Two approaches, in particular, were explored, one involving registration of files containing event collections with the Globus replica catalog, the other involving use of GDMP 1.2.2. The latter approach was exercised on EU Data Grid testbed nodes in Geneva and Milan by Silvia Resconi, using the ATLAS fast simulation program Atlfast running under Athena, with the object database product Objectivity/DB as the underlying storage technology. This work was described at the CHEP'01 conference in Beijing. US ATLAS Grid Testbed --------------------- GDMP 1.2.2 was installed and tested at the ANL-HEP node. Installation and testing of Globus DataGrid beta tools for gsiftp, data replica catalog, data replica manager also took place at the ANL HEP gatekeeper. MDS 2 was installed at the ANL HEP gatekeeper. Testing of the GRIPE account request management system continued; the system was found to be too immature for public deployment and will be further developed (at Indiana U) in light of feedback. Testing of Objectivity servers was done on the ANL,BU,IU and BNL gateways. The testbed now contains 8 gatekeepers at BNL, Boston U, Indiana U, LBNL, ANL, Oklahoma U, U Mich, UT Arlington. A PHP front end for Tilecal Production and Testbeam SQL databases is in development. These tables store meta-data and replication information for Tilecal. See http://www.usatlas.bnl.gov/computing/grid/ for more testbed information. Monitoring ---------- A PPDG working group on instrumentation and monitoring was organized, co-chaired by Dantong Yu (BNL) and Jennifer Schopf (ANL, CS rep for ATLAS). Initial steps in organizing a monitoring effort were taken during this period. Monitoring tools and instruments which are available or under development were cataloged. Requirements from information consumers were collected and compared to existing capabilities to identify missing functionality. Prioritization of the essential services and resources to be monitored in the grid infrastructure was done. Distributed job management -------------------------- A program of work and schedule was developed for the initiative (joint with GriPhyN) to study and test the capabilities of Condor to manage a hierarchical job management infrastructure incorporating the various tiers of grid sites. Discussions are underway towards possibly making this a PPDG project. See http://physics.bu.edu/~youssef/atlas/notes/ for more information on the Condor scheme being investigated. Data signature -------------- An enumeration was done of the information to be contained in a 'data signature' recording the history of a data set in sufficient detail and completeness that it could be reproduced. Design issues (such as a global identifier scheme to identify the history objects making up a data signature) have begun to be addressed. See http://www.usatlas.bnl.gov/~dladams/data_history for details about this work in progress.