Goals and scope
DIAL (Distributed Interactive Analysis of Large datasets) is a project that was originally conceived with the goal of demonstrating the feasibility of "interactive" analysis of large datasets. Interactive is taken to mean that the job completes or at least provides partial results before the user turns his or her attention to another task. In practice this means that results should start to flow back in a minute or so and the full job should complete in 5-10 minutes.
Large is not defined and the goal can be taken to be determining how large a sample can be studied interactively. ADA (ATLAS Distributed Analysis) has demonstrated this for the largest existing ATLAS AOD samples (300k events, 32GB). We expect to go well beyond this value at a single site and even further when more sites are added.
Another goal that developed as DIAL matured is to provide users with easy means to submit and monitor jobs and examine their results. DIAL is implemented in C++ and the ROOT ACLiC is used to construct ROOT dictionaries and make all the DIAL classes available at the ROOT command line. Scripts are built on top of these with the consequence that distributed analysis is naturally available from the same environment where many HEP users naturally carry out there analyses.
Other important components of DIAL include datasets, AJDL (Abstract Job Definition Language) and DIALWS, the DIAL web service development framework. The latter is used as the basis for analysis and catalog services.
Datasets
Although many HEP projects agree that datasets should be the basis for describing HEP data for analysis, few define the term precisely and those that do disagree with one another. The note "Dataset for the Grid" identifies many properties of datasets and DIAL provides a C++ class interface (dset::Dataset) and generic XML schema to describe datasets.
AJDL
DIAL job requests are expressed as transformations acting on datasets where the transformation include a application containing the scripts that carry out the action and a task carrying parameters, scripts, code, etc. used to configure the application. An application, task and dataset are used to define a job which is run to produce a new dataset, the result. DIAL provides C++ class interfaces and XML schema to provide a generic definition for each of these these four types of objects. These along with some supporting types constitute AJDL.
A DIAL scheduler or analysis service takes an application, task and dataset as input, creates a job and then runs the job to produce an output dataset. Typically the job is split into subjobs, each of which has the same application and task running on its own sub-dataset. DIAL provides interfaces and simple implementations for splitting and merging datasets. The scheduler or analysis service need not have any dependency on the ATLAS software.
DIALWS
A typical DIAL job is submitted from one location and managed by an analysis service at another location. The service may forward part or all of the request to a third location. DIAL provides a C++ web service infrastructure for creating these services. The infrastructure is based on gsoap and the GSI-plugin to add GS security. It also provides credential forwarding so that service receiving the request may make use of the calling user's proxy when invoking other services on that user's behalf. The delegated proxy is also made available for use with the local batch or WMS (workload management system). A C++ client is automatically with each web service and WSDL is generated so that clients can easily be generated for other programming languages.
Analysis services
DIAL provides an analysis service which runs one of DIAL schedulers which in turn use one of the DIAL job classes to carry out processing. A typical deployment uses the MasterScheduler which handles dataset splitting and merging and makes us of LocalScheduler to do job processing. LsfJob and CondorJob are used to submit to local batch systems. CondorJob may also be used to submit to the grid using Condor-G. ScriptedJob calls a script provided by the deployer and may be tailored to communicate with virtually any batch or WMS.
Services have been deployed for ATLAS using LSF, Condor and the gLite WMS. We plan to add a service which forwards requests to other services and use it to distribute jobs over multiple sites.
ATLAS
ATLAS is the primary customer for DIAL and most of DIAL development is directed toward the needs of that experiment.