DIAL design

David Adams
07jul03 1530 EDT


For more information, see the list of dial talks.

Datasets will provide the interface to event data. See the dataset page for more information on these.


Overview

The major components and their interactions are shown in the DIAL overview collaboration diagram (jpg, pdf).

The user interacts with DIAL through the user analysis framework. At present, the only supported framework is ROOT. The user specifies an application with a name and version. This application will be used to process the data. The user also supplies a task which specifies how to configure the application for this job. Finally, the user identifies an input dataset which defines the data to be processed. The application specification, task and dataset are submitted as a job to the scheduler. The latter splits the dataset into sub-datasets and creates and runs a job for each sub-dataset and then concatenates the results from each sub-job to create the overall result.


Sequence diagrams

The following sequence diagrams illustrate some details of these interactions: The system dependency diagram (jpg, pdf, sdr) shows the physical dependencies between relevant systems and the color coding used in the sequence diagrams.


Components

Major DIAL components are described in the following sections. Many of these are abstract interfaces.

User analysis framework

The user analysis framework provides the user interface and the usual suite of analysis tools. DIAL does not provide these tools or the interactive framework. Instead, it is intended to be used as an extension of an existing analysis framework such as ROOT.

Scheduler

The scheduler is the heart of DIAL and its interface may be thought of as a high-level job definition language. The scheduler is given a dataset, a task to perform on the dataset and the specification for an application to perform the task. These elements define a job and the scheduler either runs this job directly, passes it along to another scheduler or splits the dataset into sub-datasets creates a job for each of these and then concatenates the results. Each job produces a result and the result of the original submission is available to the user.

In the initial incarnation of DIAL, the scheduler is a single local object. In future versions, the scheduler will be distributed and it is the scheduler that will deal with catalogs, resource brokers and other grid tools to determine how to parcel out tasks and monitor their progress.

A collaboration diagram (jpg, pdf, sdr) shows how a hierarchy of schedulers might communicate with one another in a grid environemt.

Here is a recent version of header file for the abstract base class Scheduler.

A specific interface between an application and its parent scheduler is described in the ChildScheduler header.

Dataset

The data to be processed and access to that data are provided by a dataset. Datasets are described at http://www.usatlas.bnl.gov/~dladams/dataset. In the case of an event dataset, the dataset specifies which events are included, the content for each event, the location (typically logical files) where the data can be found and the mapping of event and content to location.

Here is a recent version of header file for the abstract base class Dataset.

Despite what this header indicates, the dataset is not required to provide direct access to each piece of data. Instead, the data is typically accessed in a manner natural to its format. The application processing the data must be capable of this means of access.

Job

A job class provides access to the status of a job including its result if complete and partial result if not.

Here is a recent version of header file for the abstract base class Job.

Application

The application class provides a specification of an application via a name and version. A scheduler connects this specification to an executable and run-time environment for processing the data.

A scheduler also uses the application to determine how to install the task.

Here is a recent version of header file for the concrete class Application.

Task

Tasks allow users to extend the application in a manner defined by the application. Such extensions might include a script such as a PAW or ROOT macro, a function to dynamically link or a complete algorithm or chain of algorithms.

It may be necessary to compile part of the task before running and the scheduler provides means to install a task in advance of processing data.

Here is a recent version of header file for the concrete class Task.

Result

A result is generated by event processing. It may include logical files, a dataset or individual objects such a histograms. Results provide means for appending other results of the same type so that results from distributed processing may be concatenated.

Here is a recent version of header file for the concrete class Result.

Exchange format

DIAL is implemented in C++ but each of the above data classes (Aplication, Task, datasets, results) provides an XML representation so they may be transported between processes including those not based on C++.


dladams@bnl.gov