OSG extensions activity in storage area
Initial priorities
- dCache deployment to 15 sites (ATLAS and CMS Tier 2s)
- site functional tests as prerequisite
- based on srm 2.2
- sites advertised via official OSG catalog
- installations documented on site twiki pages, with configuration exposed to support people in a standard way
- OSG-wide metrics monitored and posted: transfers average and aggregate per day, total space available/used, space available/used to general OSG community.
- based on billing DB deployed to all sites, and an OSG-standard data gather/present
- agree on OSG support model. Ticketing system, deployment and ops support.
- 'tax' of 5% of site storage capacity to be used by OSG community; payment for OSG's support services
- hard partitioning, documented and available. Manage it how?
- operations tools
- file existence and integrity checking (e.g. pnfs checker)
- Alarm sensors for dCache. Only the sensors, and in such a way that it plugs into either NGOP, Nagios, or MonALISA? based alarm systems.
- Is only the sensors enough? Require also that -- possibly in addition to feeding an in-house favorite monitor -- feed an OSG-standard system
- Lesson of Panda is that more and deeper info (as long as it is correct info) pays off for fast, effective diagnostics and for automation
- Validation, deployment of new Chimera namespace catalog when it is available?
Text from Fermilab (Eileen) for effort there:
DCache is a distributed disk-based storage system that began as a
rate-adapting front-end cache to tape-based mass storage systems, supporting
Posix I/O and FTP-oriented file access. DCache has evolved into a
full-featured storage system capable of very high data delivery rates,
optional internal data replication for increased robustness in disk-only
systems, configurable policies for automated management of internal data
flows, and a standard Grid interface using the SRM API.
Storage Resource Managers (SRMs) are middleware components managing shared
storage resources on the Grid with common application interfaces. SRMs
provide protocol negotiation, dynamic transfer URL allocation, advanced space
and file reservation and reliable replication mechanisms. OSG Storage
Elements usage of SRM interface will make the task of building an OSG Data
Grid simpler by facilitating Reservation and Sharing of Storage Resources on
the Grid.
To succeed with their physics programs, the LHC experiments ATLAS and CMS
expect to accumulate 10PB of data each in 2008, and serve it via ~30PB disk
space to ~100 MSpecInt2000? CPU power across ~100 computing centers worldwide.
A crucial challenge for the LHC physics program is to provision a
cost-effective high performance, feature rich storage element that is
sufficiently easy to operate and support at the scale of these ~100 computing
centers. ATLAS and CMS identified dCache as storage technology to satisfy
this need.
SRM Enabled Storage Systems such as dCache are software systems which can be
difficult to install, configure and support due to their distributed
architecture and great number of configuration and administration options.
The OSG specific deployment framework and integration into
existing OSG monitoring and accounting mechanisms will greatly reduce the
administration and support overhead of each deployment. Integration of
dCache/SRM Authorization with OSG Virtual Organization based authorization
services will allow a grid wide control of the storage resources and again
make tasks of user administrations simpler for site storage
administrators.
Due to various site network configurations and limitations such as existence
of firewalls, configurations must be tailored to the specific needs of each
site. Integration of the functional and end-to-end testing
and troubleshooting tools will allow an early detection and fast resolution of
problems, improving the quality of the storage service at each
installation while maintaining a reasonable support load.
Milestones:
Year 1
- Deploy, integrate, and provision SRM/dCache for general use on OSG, including storage authz, and SRM v2.2
- Integrate a suite of operations tools for all sites
- Integrate a first version end-to-end troubleshooting tool for SRM/dCache Xfers.
- Support two alternate deployment methods: VDT & ROCKS.
- Integrate in a meaningful set of site functional tests for srm/dcache
- Establish a support model
- Establish an education model
Year 2
- Enhance the end-to-end SRM/dCache troubleshooting tool.
- Establish SRM/dCache site test procedures for OSG 1.0
- Reintegration, deployment, upgrade support for established tools.
- Integrate in new operations tools
- Integrate in new site functional tests
- Integrate into SRM/dCache a site configuration backup tool
- Integrate SRM/dCache OSG specific monitoring values into an OSG supported tool
- Provide a first version integration of operational tools, site tests, and monitoring into a cohesive web based display.
Major updates:
--
TorreWenaus - 25 Sep 2006
About This Site
Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.
Attachments