DIAL use cases
August 2, 2002
This document gives use cases for DIAL (Distributed Interactive
Analysis of Large datasets).
The DIAL home page is
A. Event data specification
A1. Dataset definition
User defines a dataset by specifying which events are included
(i.e. which event or beam crossing ID's) and which data are part
of each event. The data for any event does not depend on that in any
A2. Data version
The same type of data may be generated multiple times for a given
event as the code evolves. The user specification of which data
are included for each event includes this version.
A3. Dataset content.
A user processing data specifies that only a subset of the types of
data in any event will be required. The system is able to define a
dataset which includes only this subset and process only this
restricted dataset (to reduce the cost of data access) but report
back to the user that the full dataset has been processed.
A4. Dataset input
The user specifies a dataset as the input to any of the processing
options described below.
A5. Dataset persistence
Means are provided so that a user can record a dataset produced in
one process and use as the basis for analysis in a later job.
B. Event loop Processing
B1. Event selection
User provides code that it applied to each event independently
and returns whether the event is accepted or rejected. A new
dataset is created from the input dataset.
B2. Fill histogram
User defines a histogram and provides code to fill it for each event
indpendently. Histogram is filled and returned to the user to view
and manipulate in selected analysis tool, e.g. ROOT.
B3. Fill tuple
User defines a tuple (collection of named variables) and provides
code to add any number of entries to this tuple for each event
independently. The tuple is filled for each event and returned to the
user for examination in selected analysis tool.
C. Single event processing
User specifies an event (by ID) and all the data associated with that
event is returned. Or the returned data may be limited to a predefined
subset of the data in the event.
User specifies a view for an event and provides code to fill that
view from the event. The view is filled using that code on a spacified
event (by ID).
D. Distributed processing
D1. Remote processing
In any of the above, the data are located on a machine different
from that of the user. The user describes the job to be run on
tke local machine and the job is created and run on the remote
machine and the results are returned to the local machine.
D2. Parallel processing
As in the previous except the dataset is divided into multiple
datasets each with each contining a subset of the events in the
original dataset. Each dataset is processed in a separate process
(or thread) and the results are combined and returned to the user.
D3. Multi-node processing
As in the previous except the processing jobs are distributed over
multiple compute nodes.
D4. Multi-site processing
Same as the previous except the distribution is over different sites.
D5. GRID processing
Special case of the previous where job specification, submission and
authentication are all done in the GRID framework.