r6 - 18 Aug 2006 - 12:57:44 - PatrickMcGuiganYou are here: TWiki >  Admins Web > TierTwoStorageDataServices

Tier2 Storage and Data Services

Discussion

We will focus on Tier2 deliverables and milestones. Tier3 role is not well defined yet.

Anyway Tier3 will provide support for interactive users, including home directories. International ATLAS is 2000 people. 2000 people may expect to be possible to have a home directory. Realistic expectation is ~50 users (active users) per site. These will require 1TB each. This is not a big space requirement but no other mandatory use case requires shared file system (with OSG 0.4.1 a CE can be without shared FS). Interactive login is something that Tier2 will do out of good will (on a best effort basis). Therefore it is not included in these milestones and requirements. Milestones are on what will be needed for SC and managed production/analysis (Panda). If no interactive login is supported, shared home directories are no more a requirement.

The primary choice as storage management (parallel file system) is dCache. Parallel file systems that could be an alternative to dCache are:

  • xrootd (priority to look at this - draw on SLAC expertise)
  • Ibrix
  • GPFS
  • Luster (not likely)

Issues for xrootd:

  • need SRM interface
  • file copy to local disk
  • file ownership, organization of files by ownership (no
directories in xrootd)

Some consideration about the data hosted at Tier2s:

  • AOD: produced outside and coming from the external network
(through channel exercised in SC4) - this is the majority of the data. Each Tier 2 is to hold all AODs. Shared among Tier2s in cloud? For nominal year, 200 TB real data AODs, 40 TB simu AODs. At least 465 TB per US Tier 2 expected so consistent. (Nominal 2006 numbers: 130-200TB per US Tier 2, but no one has close to this)
  • managed production/analysis data
  • user data hosted in SE

Tasks and Deliverables

To the following deliverables has to be added what is specified within ATLAS Computing TDR or other documents. Total space requirement (in TB) have to be added and crossreferenced with the content of the TierTwoFacilities document.

Each T2 has to:

  • be able to connect to Tier1 through FTS channel (no work necessary, the channel is open by the Tier1)
  • be able to connect to all other US Tier 2s through FTS as well (trial program to test this is being proposed)
  • provide a SRM managed storage
    • Backend storage options:
      • dCache (SRM/dCache)
Concerns about dCache feasibility. Scaling, manageability. Only other credible solution believed to be xrootd. Catalog is a weak point, to be replaced with Chimera (late this year?)
      • xrootd. We want to look at this as an option, drawing on
SLAC expertise. Andy Hanushevsky believes xrootd can be a good solution. Can Andy (or someone else at SLAC) work on this? SRM support is necessary to make xrootd useful in grid environment. SLAC is looking at this; theyhave installed SRM/dCache and SRM/Linux file system.
    • UTA working on getting SRM/DRM working on top of Ibrix
    • A suggested SRM/dCache server would require at least 2, better 3 or more machines.
      • a CORE machine providing a file server serving at least 1TB using PNFS. Possibly this machine (the DB and PNFS manager) should be separate on a powerful machine (leave all the other daemons of the
core and dors on a the other machine): BNL has 2 3.0 Xeon, 4 GB
      • one or more machines with DOORS (gsiftp, dcap, SRM). You may like a separate machine for the SRM door.
      • provide some reader/writer nodes (at least one reader/writer)
  • provide a DQ2 site service
    • DQ2 site should have a single, stable endpoint that changes only when the endpoint host changes (no site-internal path, partition info)
      • until we are using SRM everywhere, where we are using partitioning we need a scheme to manage partitioning of production data areas internally to site/Panda (eg Panda-level site config), not
by changing ToA? endpoint
      • sites should use an alias for endpoint host so machine changes do not require ToA? changes
  • provide a US ATLAS standard LRC (currently POOL FC MySQL? with http service)
    • separation of LRC DB service from site service DB?
      • Possible need for different versions (eg LRC version update for long LFN support)
      • This separation is strongly recommended by DQ2 developers
        • LCG DQ2 cleanly separates DQ2 (site services) from LRC (LFC)
      • Agreed to be a good idea but may bring heartburn
  • support US ATLAS standard, and US ATLAS managed, space usage controls
    • so US ATLAS needs to provide these tools!
      • there are new DQ2 capabilities for dataset cleanup including file delete, LFC based-- we need to adapt/extend for OSG so we have full set of admin tools
      • we need to use new LRC info (last touch time, archive bit) to implement a high water mark space control daemon for deployment at US sites
        • But timestamp (and archive bit?) should be PFN-associated and not LFN-associated
  • support data access tools defined by US ATLAS that are needed for US access to LCG data (eg LFC access, dq2_get support using lcg-cp, etc)
  • support US ATLAS security policies and procedures for catalogs and data

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments

 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback