USAtlas Data Management and Database Support

ATLAS Data Organization

Simple Facts on Data Storage

  • Data will be stored in a Tiered structure: 0/1/2/3.
  • Data will be categorized by various properties
    • Monte Carlo or Beam data
    • Stage of processing: RAW (bytestream), RDO, ESD, AOD, DPD, etc.
    • Stream : Photon, SUSY, etc.
    • Code Release
    • Conditions Tag/Version

Tier Properties

  • Data access at Tier 0/1 facilities will be restricted.
  • The majority of Tier 0/1 resources are slated for processing and/or reprocessing RAW data.
  • The majority of Tier 2 resources are slated for Monte Carlo production.

Data Properties

  • Data will be stored in files which will be grouped into datasets.
  • Event size will go down in later processing stages.
    • Concatenation will occur at the file level, so that one might have 10000 events per AOD file, but only 1000 events per RAW file.
  • Not all data properties will be known from file or dataset names.

North American Facilities

Atlas Data Management

Data is distributed among the sites using the Distributed Data Management system developed by Atlas. The primary tool for navigating this system is Don Quijote. Don Quijote is well integrated into the USAtlas distributed analysis environment, pathena. Due to the data sizes used by Atlas, it is strongly suggested that users make use of a distributed analysis environment. This means sending the job to the data rather than moving data.

In cases where you do need to set up a data transfer, the preferred way is to register a subscription at the site where you want the data. Help on doing this and monitoring the process can be found at DDM Operations.

Don Quijote and the DDM infrastructure follow a release structure.

Tools for Finding Data

Tools for Finding Metadata and Configurations

Pointers to Tools for Accessing and Reading Data

Major updates:
TWikiAdminGroup - 17 Jan 2018

