Draft 1.2
  04 July 2001

Technical Annex 2 to ATLAS Software Agreement No. 2001/01: Data Dictionary


This document describes in brief the role, scope, and implementation of an "ATLAS Data Dictionary" (ADD) in the Athena Architecture, for the purposes of ATLAS Software Agreement No. 01/2001. The deliverables and milestones described herein are those deemed reasonable over a 2 year time period starting January 2001 given a commitment of approximately 2.5 FTE engineer effort, 0.3 FTE physicist effort, and 0.5 FTE computer scientist and management effort.

This document is a condensation of a snapshot of the more exhaustive and technical White Paper (see http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/architecture/DataDictionary/Architecture/). Please see this white paper for details and technical explainations of issues and terms in this appendix.

Definition of Terms

The term data dictionary is being used in ATLAS to cover several related, but distinct concepts and techniques. Each of these concepts plays a different set of roles in an architecture dependent upon a data dictionary. We categorize these concepts into three general categories:

Purpose and Scope

The Data Dictionary provides a way of describing data objects and using the description as input to compiler-based utilities and as input to Athena services. These utilities and services then provide the authors of data object classes automatic access to a myriad of capabilities of the Athena framework, either through code generation, or through the use of a data object Reflection API.

The Data Dictionary ensures compatibility and coherence between disparate data objects and their derivatives by using a single source for the data object description.

The motivation for implementing a Data Dictionary in Athena can be illustrated by considering the potential roles that a Data Dictionary can play within the Architecture. These roles include:

One of the most visible implementation decisions of a Data Dictionary for ATLAS is the choice of the computer language used in the dictionary. Hereafter we will refer to this language as the ATLAS Definition Language (ADL). Although the choice of ADL is an important decision, it is arguably not a make-or-break decision. However, because of the extreme visibility of such a choice the decision-making process must be very well documented and technically motivated.

The Data Dictionary will be used at multiple ATLAS sites by a large number of people. Scalability, distributability, and ease of use of the Data Dictionary are important design criteria. The ease of use of the Data Dictionary depends largely upon the target audience. For the typical physicist, the Data Dictionary must be easy to use and must have clear benefits or it will not be used. For more sophisticated users (e.g. Core Programmers), the burden of learning a new system or language must be outweighed by the long-term benefits.

The Data Dictionary affects many aspects of ATLAS computing, as indicated above, but its design and development are intimately connected to the Control Framework. Hence this design and development must be carried out within the context of the work of the Architecture Team, led by the Chief Architect.

The task is to develop a Data Dictionary for ATLAS, oversee its deployment, and provide documentation and user-support in a manner to be mutually agreed with those responsible for the ATLAS Quality Control and Documentation procedures. The task is to be carried out within the context of the ATLAS Computing Management under the overall co-ordination of the Computing Coordinator.

The deliverables for the ATLAS Data Dictionary fall into three categories:

  1. Data Object Description
    Language-based description of data objects.
  2. Compiler-Based Code Generators
    Standard 2-tier design with single, grammar-driven Compiler Front End (CFE), an intermediate Compiler Object Representation (COR), and multiple Compiler Back Ends (CBEs).
  3. Reflection API & Athena Services
    A programmatic API similar to the COR and available through Athena Services at run-time.

The development should follow a cyclic and iterative model; intermediate versions should be released to the Collaboration, implementing increasing functionality. The major releases foreseen as of December 2000, and the functionality expected, are given below. It should be noted that the functionality includes aspects, particularly which are not part of Control Framework as defined here. The further development of this release strategy is part of the Data Dictionary task.

Schedule as of April 2001

Many of the deliverables are characterized as "prototype" or "deployed". The first means that there is a proof-of-principle demonstration that is available to a limited set of testers; the second means that it's available to all developers/users within the context of an ATLAS developers release. Deployment is not meant to imply that development ceases at that point. It is expected that many deliverables will have extended, iterative development cycles extending well past their nominal deployment.

Deliverables can be classified as "Essential" (E) or "Desirable" (D). The Desirable classification can indicate either some flexibility in timeline, or a lower priority for those deliverables.

Two Year Time Line & Milestones

Possible Scope Expansion

In fact, the scope of the Data Dictionary project can be much wider than that described above. Given additional resources and/or time, or a realignment of ATLAS priorities, many other projects and deliverables are possible based on the ADL and ADD. A very incomplete list of examples can be found in the White Paper under the "Definition of Deliverables" section.