If you have not already done so, you may wish to run one of the demos before starting your own analysis. See the demo page for information on running the demos. See the getting started page for instructions on installing, setting up and running DIAL. In the examples below, we make us of the root interface.
The catalogs are changing with time: the number and size of Rome datasets is increasing as data is produced and the transformations are being improved. Any queries you make may have different results than those shown below.
A job is specified by defining a transformation and selecting a dataset to process with this transformation. The transformation is specified by an application and a task. The application carries the scripts that do the processing and the task carries user configuration data.
The application, task and dataset objects may be created or archived objects may be extracted from the corresponding repository. The latter requires knowledge of the object ID. Objects of general interest are published in selection catalogs. Entries in these catalogs are identified by a name and include an object ID and metadata to aid in object selection.
The demos identify objects by name, extract the corresponding ID from a selection catalog and use this ID to extract the object from a repository. See any of the demo scripts, e.g. that for demo 6. The following examples go beyond the demos and use queries to make the selections and then creates a new task using the selected task as a starting point. This is expected to be a typical use of the system.
A job is created by submitting an application, task and dataset to a scheduler. In the examples below, this scheduler is a remote analysis service. As with the demos, results are available when processing is complete and partial results may be examined during processing. In the current transformations, these results are obtained by merging the histograms and ntuples from the completed subjobs.
(Skip to the next session if you know your transformation and dataset and just want to submit a job.)
Having started root with the dial root command, we begin by displaying the status of all catalogs to verify our connection and see the size of each:
root [10] show_catalogs() dr - Dataset repository has 206976 entries dfc - Dataset file catalog has 113934 entries and 15 columns dsc - Dataset selection catalog has 5604 entries and 15 columns ar - Application repository has 31 entries asc - Application selection catalog has 6 entries and 5 columns tr - Task repository has 32 entries tsc - Task selection catalog has 23 entries and 5 columns jr - JobRepository has 0 entriesWe begin by querying for a dataset:
root [27] print(dsc.query("level = 'TOP' and name like 'rome%recov10%SU%AOD-bnl'", 100))
List has 12 entries:
rome.004401.recov10.SU1_Jimmy_coann.AOD-bnl
rome.004402.recov10.SU2_Jimmy_focus.AOD-bnl
rome.004403.recov10.SU3_Jimmy_bulk.AOD-bnl
rome.004404.recov10.SU6_Jimmy_funnel.AOD-bnl
rome.004406.recov10.SU4_Jimmy_lowmass.AOD-bnl
rome.004410.recov10.SU51_Jimmy_scan.AOD-bnl
rome.004411.recov10.SU52_Jimmy_scan.AOD-bnl
rome.004412.recov10.SU53_Jimmy_scan.AOD-bnl
rome.004421.recov10.SU1_Jimmy_coann.AOD-bnl
rome.004423.recov10.SU3_Jimmy_bulk.AOD-bnl
rome.004424.recov10.SU6_Jimmy_funnel.AOD-bnl
rome.004426.recov10.SU4_Jimmy_lowmass.AOD-bnl
Note we limited the query to 100 results and received 12 and so we know we have
all matching datasets. The query resticts the selection to TOP level datasets, i.e.
complete samples intended for user access and then uses the name to select Rome
samples with v10 reconstruction, SUSY data using all AOD data avaialble at BNL.
Replace AOD-bnl with AOD to get samples available at both CERN and BNL.
We can count datasets matching a query with the query_count method, e.g.
oot [28] print(dsc.query_count("level = 'TOP' and name like 'rome%recov10%AOD-bnl'"))
141
We look at the schema and see if we can find a way to further refine our search:
root [30] print(dsc.schema()) List has 15 entries: uid name level owner type virtual nevt nfile nsub runmin evtmin runmax evtmax update_uid modtimeLet's restrict the search to large samples:
root [45] print(dsc.query("level = 'TOP' and name like 'rome%recov10%SU%AOD-bnl' and nevt>100000"))
List has 2 entries:
rome.004401.recov10.SU1_Jimmy_coann.AOD-bnl
rome.004421.recov10.SU1_Jimmy_coann.AOD-bnl
Our SUSY friends tell us the 440* samples should not be used and we list the attributes of
the 4421 dataset:
root [46] print(dsc.attributes("rome.004421.recov10.SU1_Jimmy_coann.AOD-bnl"))
Row has 15 entries:
evtmax = 0
evtmin = 0
level = TOP
modtime = 20050522110923
name = rome.004421.recov10.SU1_Jimmy_coann.AOD-bnl
nevt = 146678
nfile = 3026
nsub = 80
owner = rome
runmax = 0
runmin = 0
type = AOD
uid = 10013-170225
update_uid = 10013-170225
virtual = 0
Record the ID and fetch the dataset
from the dataset repository:
root [51] did=dsc.id("rome.004421.recov10.SU1_Jimmy_coann.AOD-bnl")
(class DatasetId)(-1222959128)
root [52] print(did)
10013-170225
root [53] pdst = dr.extract(did);
root [54] pprint(pdst)
EventMergeDataset 10013-170225 with no parent is locked and not empty
Content includes 1 block:
Dataset content block:
Dataset type: AtlasPoolEventDataset
Content label: AOD
Content ID list has 36 entries:
type BJetContainer with with key BCandidates
type ElectronContainer with with key ElectronCollection
type INavigable4MomentumCollection with with key MuonboyTrackParticles
type INavigable4MomentumCollection with with key StacoTrackParticles
type INavigable4MomentumCollection with with key TrackParticleCandidate
type INavigable4MomentumCollection with with key TrackParticleCandidateXK
type JetTagContainer with with key BJetCollection
type McEventCollection with with key GEN_AOD
type MissingET with with key MET_Cryo
type MissingET with with key MET_Final
type MissingET with with key MET_Muon
type MissingET with with key MET_Topo
type MissingEtCalo with with key MET_Base
type MissingEtCalo with with key MET_Calib
type MissingEtTruth with with key MET_Truth
type MuonContainer with with key MuonCollection
type ParticleJetContainer with with key Cone4TowerParticleJets
type ParticleJetContainer with with key Cone4TruthParticleJets
type ParticleJetContainer with with key ConeTowerParticleJets
type ParticleJetContainer with with key ConeTruthParticleJets
type ParticleJetContainer with with key KtTowerParticleJets
type ParticleJetContainer with with key KtTruthParticleJets
type PhotonContainer with with key PhotonCollection
type Rec::TrackParticleContainer with with key MuidCombTrackParticles
type Rec::TrackParticleContainer with with key MuidCombTrackParticlesLowPt
type Rec::TrackParticleContainer with with key MuidMooreTrackParticles
type Rec::TrackParticleContainer with with key MuidMooreTrackParticlesLowPt
type Rec::TrackParticleContainer with with key MuidStandAloneTrackParticles
type Rec::TrackParticleContainer with with key MuidStandAloneTrackParticlesLowPt
type Rec::TrackParticleContainer with with key MuidiPatTrackParticles
type Rec::TrackParticleContainer with with key MuidiPatTrackParticlesLowPt
type TauJetContainer with with key TauJetCollection
type TrackParticleTruthCollection with with key TrackParticleTruthCollection
type TrackRecordCollection with with key MuonEntryRecordFilter
type TruthParticleContainer with with key SpclMC
type VxContainer with with key VxPrimaryCandidate
Event count is 146678
Location has 3026 files:
lfn://atlas/rome.004421.recov10.SU1_Jimmy_coann._00801.AOD.pool.root
lfn://atlas/rome.004421.recov10.SU1_Jimmy_coann._00802.AOD.pool.root
lfn://atlas/rome.004421.recov10.SU1_Jimmy_coann._00803.AOD.pool.root
lfn://atlas/rome.004421.recov10.SU1_Jimmy_coann._00806.AOD.pool.root
lfn://atlas/rome.004421.recov10.SU1_Jimmy_coann._00808.AOD.pool.root
...
lfn://atlas/rome.004421.recov10.SU1_Jimmy_coann._04000.AOD.pool.root
Dataset ID list has 80 entries:
First ID: 10013-159915
Last ID: 10013-170223
The last command displays the information stored in the dataset object which is distinct
from but has some overlap with the data published in the selection catalog.
Next we select an application and task in a similar way:
root [55] print(asc.query("", 50))
List has 6 entries:
aodhisto
aodhisto-old
atlasdev
atlasdev-src
atlasopt
esd2aod
root [56] print(asc.attributes("atlasopt"))
Row has 5 entries:
modtime = 20050609104509
name = atlasopt
owner = dadams
task_interface = atlas_job_options
uid = 10201-640
root [57] print(tsc.query("task_interface = 'atlas_job_options'"))
List has 6 entries:
atlasopt_example_zll
demo6
atlasopt_example_zll-9.0.4
atlasopt_example_zll-10.0.1
atlasopt_example_zll-9.0.4-dev
atlasopt_example_zll-10.0.1-dev
root [58] aid = asc.id("atlasopt");
root [59] print(aid)
10201-640
root [60] papp = ar.extract(aid);
root [61] tid = tsc.id("atlasopt_example_zll");
root [62] print(tid)
10301-25
root [63] ptsk = tr.extract(tid);
root [64] pprint(papp)
Application 10201-640 has 4 files:
build_task
readme.txt
release_notes.txt
run
root [65] pprint(ptsk)
Task 10301-25 has 2 files:
atlas_release
jo.py
It is unlikely that you want to modify the application but very likely that
you would like to modify the task.
Extract the the files from the task:
root [68] ptsk->write_files("mytask", true)
(const int)0
This extracts the files into the directory mytask. The second argument
indicates that this directory may be created if it does not exist.
We exit to a command shell and examine and modify the task files:
root [69] .sh
sh-2.05b$ l mytask
total 8
4 -rw-rw-r-- 1 dladams dladams 6 Jun 9 16:47 atlas_release
4 -rw-rw-r-- 1 dladams dladams 54 Jun 9 16:47 jo.py
sh-2.05b$ cat mytask/atlas_release
9.0.4
sh-2.05b$ cat mytask/jo.py
include ("AnalysisExamples/ZllExample_jobOptions.py")
sh-2.05b$ vi mytask/atlas_release
sh-2.05b$ cat mytask/atlas_release
10.0.1
sh-2.05b$ vi mytask/jo.py
sh-2.05b$ cat mytask/jo.py
include ("AnalysisExamples/ZeeZmmOnAODExample_jobOptions.py")
sh-2.05b$ exit
exit
We changed to a more recent version of the ATLAS release and updated
the job options accordingly.
See the share directory of the AnalysisExamples package for some example job options. You may include one of these (as in the example) or copy it to jo.py and modify as desired.
Note that, at present, the atlasopt application supports the output of histograms, ntuples or both but does not support the production of event data.
Build a new task from the modified files:
root [71] ptsk = new dial::Task("atlas_release jo.py", "mytask");
root [72] pprint(ptsk)
Task 10301-823 has 2 files:
atlas_release
jo.py
The list of files used to construct the task may be replaced with "*"
if you want all the files from the directory.
Now that papp, ptsk and pdst are defined appropriately, we can submit a job:
root [25] submit()
Application 10201-640
Task 10301-823
Dataset 10013-170225
*** Submitting job
*** Submitted job status:
CompoundJob 10501-35542 is running
Application: 10201-640
Task 10301-823
Dataset 10013-170225 with 146678 events
Job preferences ID 0-0
Owner: /DC=org/DC=doegrids/OU=People/CN=David Adams 407137
Credential: /DC=org/DC=doegrids/OU=People/CN=David Adams 407137
Run host: adial01.usatlas.bnl.gov
Job directory: /usatlas/u/dial/local/jobs/MasterScheduler/00/00/29/05/00/00/8a/d6
create time: 2005 June 09 17:03:02
start time: 2005 June 09 17:03:27 (25 sec elapsed)
update time: 2005 June 09 17:03:27 (25 sec elapsed)
There are 80 subjobs
49 running
0 done
0 failed
0 killed
0 included in result
Events processed: 0 (0%)
in result: 0 (0%)
The job does not have a result
(int)0
The job may be monitored using print(msch.job(jid)) or get_results() as
described on the demo page.
Edit the top part of this script to specify the application, task and dataset of interest. The ramainder of th script is used o to extract the corresponding objects and store their pointers in papp, ptsk and pdst.
Start root (command dialroot), run the job definition script and submit and monitor as above:
root [0] .x jobdef.C root [1] submit() ... root [2] get_results() ...The last command is repeated until the job is complete.
Aodhisto (developed by F. Fassi and T. Maeno) allows the user to supply code that is built inside the AnalysisExamples package. For information about that package, please see the tutorials linked from the ATLAS analysis tools page.
Follow the same procedure as in the atlasopt example above, except select the application "atlasopt" and start from one of the atlasopt example tasks. These may be found with the following query:
root [25] print(tsc.query("name like 'aodhisto%'"))
List has 3 entries:
aodhisto_big
aodhisto_zll_aod
aodhisto_zll_aod-esd
or query for all compatible tasks (i.e. those supplying the
atlas_simple_analysis interface expected by aodhisto) with
root [5] print(tsc.query("Task_interface='atlas_simple_analysis'"))
List has 4 entries:
aodhisto_big
demo4
aodhisto_zll_aod-esd
aodhisto_zll_aod
We hope to have more of these soon.
Here is an example job definition script that can be used to run a job using the example task or a task provided by the user. Execute this script and then use submit() to submit a job. It may take a couple minutes for the to get a response after submission because the analysis service is compiling your code.
If the submission returns an invalid job ID, then it is likely that your code did not compile. Although it should, the scheduler does not provide means to access the log files for the task build (compilation). If the scheduler is running at BNL, the log files may be found at
/usatlas/u/dial/local/tasks/AID/TIDwhere AID is the application ID and TID is the task ID.
To get started, query existing compatible tasks with
root [1] print(tsc.query("task_interface='atlas_developer_directory'"))
List has 5 entries:
atlasdev_example_zll-9.0.4
demo7
atlasdev_example_zll-9.0.4-dev
atlasdev_example_zll-nochange-10.0.1
atlasdev_example_zll-erescale-10.0.1
and select and modify any one as in the previous examples. Here is an
example job definition script. Again,
modify the script to select one of the above or a local task definition and
then, inside root, execute the script and use submit() to submit a job.
Note that, at present, atlasdev only works with development releases and not with release kits. Add the suffix "-dev" to the version in atlas_version to ensure that such a release is used.