Event model: The event
D. Adams
27sep01 1300
Data in the event
There are two types of data associated with an event: environmental
information that describes the conditions under which the event was
created and the much larger physics data. Users will want to access
both types of data for single events and more often for collections
of events which meet some criteria. Example criteria might include:
-
events from a particular run (time period),
-
all events with missing Et above 150 GeV reported by a specified algorithm or
-
all data which has not been processed with a particular algorithm
A typical usage pattern would be to iterate over such a collection and
apply the same set of algorithms to each event.
Environmental data
The environmental information for an event includes
- event identifier (e.g. run and event numbers)
- time stamp
- trigger(s)
- luminosity
- ID for the processor(s) constructing the event
- for Monte Carlo data, the random number states(s)
We will not discuss the details of these here. We simply note that users
will want to access these data for particular events.
Physics data
Most of the physics data is derived. For Monte Carlo everything can be
derived from the initial random number state (e.g. seeds). For real data,
everything is derived from the raw data. Here we do not discuss the
physics content (clusters, tracks, jets, electrons, ...) of this data.
The physics data is immutable. Once written, the data is not modified.
This greatly simplifies the management of persistent data. However it
is always possible for the data to grow either by applying new algorithms,
or old algorithms with different parameters or code.
We impose a condition of reproducibility. The event is required to
carry history information that is sufficient to reproduce its derived
data. Platform variability, such as
differences in numeric representation (especially floating point)
may lead to different results on different computers and we do not
(yet) impose
the requirement that all machines produce exactly the same data.
The derived data is created by a series of algorithms. The output of any
one algorithm is a piece (or pieces) of data derived from some
subset of the existing data.
The ATLAS unit of persistent data is the "data object".
We will refer to the piece produced by an algorithm as
an event data object or EDO.
An EDO will typically be a container of reconstructed objects,
e.g. tracks or electrons.
Our conditions of immutability and extensibility are achieved by
making EDO's truly immutable--they cannot be modified in any way
including expansion. However the event can be expanded by adding
more EDO's.
The requirement of reproducibility is met by requiring the event
carry history information for each EDO. This history is sufficient
information sufficient to reproduce its EDO. It includes:
- a unique algorithm identifier (e.g. name and version)
- the run time parameters used by the algorithm
- the calibration, alignment or other external data used by
the algorithm
- algorithm return status (success/fail or pass/fail for a filter)
- the list of EDO's used as input
- computer identifier
- time stamp
Data storage
The above describes the nature of the event information
that it is available to users. It does not describe the user interface
(later sections) or the manner in which this data is stored either in
memory or on disk. Clearly much of the event environment and EDO history
is common to multiple events and need not be duplicated. Also it may be
desirable to defer the transfer of most or all event information until it
is required. This transfer may be from disk to memory, tape to disk or
between remote sites.
Virtual data
The existence of the EDO history information enables the use of virtual
data where an EDO is generated (or regenerated) on-demand from the history
rather than being read in from disk. The choice of real or virtual data
can be determined by the relative availability of processing power and
storage capacity.
The problem of platform variability must be handled
especially if data is to be regenerated. One step might be to identify
a category of platforms which produce identical results and restrict
production of a particular EDO to that category.
dladams@bnl.gov