EventView, and made scalable to large datasets with the use of distributed analysis tools like pathena. The output of AOD based analysis is generically called DPD. The DPD contents will vary depending on the purpose, but will often be shared among a physics group. Our most prolific technology for DPD currently is the Athena-Aware Ntuple, and the contents range from CBNT, to HighPtView, to TopView, to customized EventView-based ntuples. For release 13, we are developing a technology to access the transient classes used in Athena within ROOT -- a technology called AthenaROOTAccess. So far AthenaROOTAccess has only been exercised on AOD files, but we expect its main use to be at the second stage of analysis on DPD.
Even with our "flat" AANT we have seen that ROOT-based DPD analysis is not scalable. In the context of VBFHiggsToTauTau, the analysis code can take more than a day to run over all the signal and background datasets. For comparison, the pathena step to produce the AANT only takes a few hours -- demonstraiting that pathena is an excellent tool for Athena-based AOD analysis. Because it is clear that ROOT based analysis on a single machine will not scale to large datasets we must investigate ROOT-based distributed analysis tools that provide an interactive environment. It has been pointed out that pathena can be made to run ROOT macros on AANT files; however, this does not provide an interactive environment like a user expects (eg. the ROOT command prompt). PROOF does provide this interactive environment, and is an obvious candidate to be considered.
In June 2006, a PROOF testbed was setup at BNL taking advantage of the existing xrootd testbed. Preliminary results showed the expected speed improvements, dropping a 7 minute wait for a command on a local machine to less than 40 seconds on the testbed. These results and the potential role of PROOF at the Tier2s and Tier3s (see Torre's Talk at Indiana ) have motivated a more coordinated and organized assessment of PROOF and its potential role in the analysis model. This page has been created to centralize the further studies of PROOF for DPD analysis.
The ultimate goal for these studies is to demonstrait and assess feasability of PROOF for ROOT-based analysis of DPD including SetupROOTforPROOF.sh with these contents #This is correct as of December 14, 2007 #### setup ROOT #### # instruction are for bash, modify appropriately for tcsh, etc # export ROOTSYS=/afs/usatlas.bnl.gov/cernsw/lcg/external/root/5.17.08/slc4_ia32_gcc34/root export PYTHONDIR=/afs/usatlas.bnl.gov/cernsw/lcg/external/Python/2.4.2/slc4_ia32_gcc34 export PATH=$ROOTSYS/bin:$PYTHONDIR/bin:$PATH export LD_LIBRARY_PATH=$ROOTSYS/lib:$PYTHONDIR/lib:$LD_LIBRARY_PATH export PYTHONPATH=$ROOTSYS/lib:$PYTHONPATHand then source it. The version of ROOT that comes with athena releases is quite old, and you should use 5.14 instead. I (Kyle) had some problems getting things to work unless I removed the references to the dq2 clients, dcache clients, and lcg grid_env.sh scripts from my .bash_profile.
TProof::Open("acas0420.usatlas.bnl.gov");
// You can use either TDSet or TChain. TDSet will take advantage of PROOF automatically,
// while you have to do manually for TChain via TChain::SetProof.
TDSet *d = new TDSet("TTree","FullRec0");
// or
// Tchain *d = new TChain("FullRec0");
// d->SetProof();
// adding files to the TDSet is similar to the TChain
d->Add("root://acas0420.usatlas.bnl.gov//data/cache/HPTV/user.TARRADEFabien.trig1_misal1_csc11.005013.J4_pythia_jetjet.Athena_12.0.6.GroupArea_12.0.6.6.Jamboree_II-HightPtView-00-00-30.AAN.AANT0._00065.root");
//To add all files from a dataset, you can use DQ2IF as described in the section below
// you may want to make a little macro to add many files, TDSet doesn't seem to like wild cards.
//addFilesToDSet3(d); // adds all 4588 HPTV files! Included in BasicProofCommands.C attached to this wiki page
TStopwatch t;
t.Start();
d->Draw("Jet_C4_p_T","Jet_C4_N>0");
// if you have a TSelector, you would do this
//d->Process("ProofTest.C+");
t.Stop();
t.Print();
Get the Example Script that was used to make the plots at the end of this page.
// replace "cranmer" with your username
TProof::Reset("cranmer@acas0420") ;
DQ2IF.py to directly add files to a TDSet or TChain using DQ2IF.getFiles.
Copy the DQ2IF.py file to your local directory (included in PhysicsAnalysis/DistributedAnalysis/PandaTools):
cp /usatlas/u/maeno/offline/12.0.6/PhysicsAnalysis/DistributedAnalysis/PandaTools/python/DQ2IF.py .DQ2IF? can override the path to the physical files as stored in the catalog and take a user-defined path instead. It also supports some pattern matching, so that you can add only the files with "AANT0" in their name (it will also exclude the log files). Here's an example using DQ2IF? to add all the files in a DQ2 dataset that have "AANT0" in their name prepending the xrootd path explicitly:
TDSet *d = new TDSet("TTree","FullRec0");
//TChain *d = new TChain("FullRec0"); // also works with TChain, but PROOF won't be used.
TPython::LoadMacro("DQ2IF.py");
DQ2IF dq2;
TPython::Bind(d,"myTDSet");
dq2.getFiles("user.TARRADEFabien.trig1_misal1_mc12.008331.PythiaVBFH120tautaulh.Athena_12.0.6.GroupArea_12.0.6.6.Jamboree_II-HightPtView-00-00-30.AAN","myTDSet" ,"root://acas0420.usatlas.bnl.gov//data/cache/HPTV/","AANT0");
This module should work with pyROOT too. So once DPDs are registered in DQ2 and LRC, one can easily use the info in PROOF sessions.
TTree::MakeClass). After the skeleton is made you can add your own code. To add a single histogram to the Selector called "ProofTest", follow these steps. #include <th1.h _moz-userdefined=""></th1.h> at the top. Under public: add = TH1F? * myHist;=.
/*tree*/ so that it looks like this: void ProofTest? ::SlaveBegin(TTree * tree)
Init(tree);
myHist = new TH1F? ("myHist","test histogram", 100, 0, 3000000);
fOutput->Add(myHist);
fChain->GetTree()->GetEntry(entry);
for(int i=0; i<JET_C4_N; _moz-userdefined="" myhist-="" ++i)="">Fill(Jet_C4_p_T->at(i)); myHist=dynamic_cast<TH1F* _moz-userdefined="">(fOutput->FindObject("myHist")); myHist->Draw(); d->Process("ProofTest.C+") command. Note, if you do not specify which branches you want to read, the processing speed will drop dramatically. Initial tests with HPTV reading all data show about 2.6k evts/sec, while the Draw command ran at 158k evts/sec.
An example TSelector is attached to this wiki page (ProofTest.h ProofTest.C.
More detailed documentation can be found here:
$ cat PyTest.py
import ROOT
class PyTest(ROOT.TSelector):
def __init__(self):
ROOT.TSelector.__init__(self)
self.fChain = None
def Init(self,tree):
print "in Init"
self.fChain = tree
def Process(self,entry):
print "in Process"
self.fChain.GetTree().GetEntry(entry)
print self.fChain.EventNumber
==============================================
$ cat Wrapper.h
#ifndef Wrapper_h
#define Wrapper_h
#ifndef __CINT__
#if defined(linux)
#include <stdio.h>
#ifdef _POSIX_C_SOURCE
#undef _POSIX_C_SOURCE
#endif
#ifdef _FILE_OFFSET_BITS
#undef _FILE_OFFSET_BITS
#endif
#endif // defined(linux)
#include "Python.h"
#else // __CINT__
struct _object;
typedef _object PyObject;
#endif // !__CINT__
#include "TSelector.h"
class Wrapper : public TSelector
{
public :
Wrapper(TTree * /*tree*/ =0) : m_self(NULL) { }
virtual ~Wrapper() { }
virtual Int_t Version() const { return 2; }
virtual void Init(TTree *tree);
virtual Bool_t Process(Long64_t entry);
private:
PyObject* m_self;
};
#endif
==============================================
$ cat Wrapper.C
#include "Wrapper.h"
#include "TPython.h"
#define MY_SELECTOR "PyTest"
void Wrapper::Init(TTree *tree)
{
if (!tree) return;
// just to initialize Python/TPython stuff
TPython::Exec("import ROOT");
if (m_self == NULL)
{
PyObject* module = PyImport_ImportModule(MY_SELECTOR);
if (module != NULL)
{
PyObject* pclass = PyObject_GetAttrString(module,MY_SELECTOR);
if (pclass != NULL)
{
PyObject * args = PyTuple_New(0);
m_self = PyObject_Call(pclass,args,NULL);
Py_DECREF(args);
if (m_self != NULL)
{
/*
TClass* klass = tree->IsA();
PyObject* ptree = PyROOT::BindRootObject((void*)tree,klass );
*/
char * ptree_name = "_py_tree";
TPython::Bind((TObject *)tree, ptree_name);
PyObject* pmain = PyImport_AddModule("__main__");
PyObject* ptree = PyObject_GetAttrString(pmain, ptree_name);
PyObject* result = PyObject_CallMethod(m_self, "Init", "O", ptree);
if (result != NULL)
Py_DECREF(result);
else
PyErr_Print();
Py_DECREF(ptree);
}
else
PyErr_Print();
Py_DECREF(pclass);
}
else
PyErr_Print();
Py_DECREF(module);
}
else
PyErr_Print();
}
}
Bool_t Wrapper::Process(Long64_t entry)
{
if (m_self == NULL) return kFALSE;
PyObject* result = PyObject_CallMethod( m_self, "Process", "L", entry );
Py_DECREF(result);
return kTRUE;
}
==============================================
You may add more methods like SlaveBegin() and may replace PyTest with your algorithm name.
$ root.exe
// open session
TProof::Open("acas0420.usatlas.bnl.gov");
// add Python to LD_LIBRARY_PATH
gProof->AddDynamicPath("/afs/usatlas.bnl.gov/cernsw/lcg/external/Python/2.4.2/slc4_ia32_gcc34/lib")
// need this one because libpython2.4.so is not loaded automatically
gProof->Exec("gSystem->Load(\"libpython2.4.so\")")
// add Python to include path
gProof->AddIncludePath("/afs/usatlas.bnl.gov/cernsw/lcg/external/Python/2.4.2/slc4_ia32_gcc34/include/python2.4")
// set PYTHONPATH
gProof->Exec("gSystem->Setenv(\"PYTHONPATH\",\"/usatlas/u/maeno/proof:/afs/usatlas.bnl.gov/cernsw/lcg/external/root/5.14.00/slc4_ia32_gcc34/root/lib\");") Replace /usatlas/u/maeno/proof with the directory where your python algorithm exists.
// create dataset
TDSet *d = new TDSet("TTree","CollectionTree")
d->Add("root://acas0420.usatlas.bnl.gov//data/cache/HPTV/user.TARRADEFabien.trig1_misal1_csc11.005013.J4_pythia_jetjet.Athena_12.0.6.GroupArea_12.0.6.6.Jamboree_II-HightPtView-00-00-30.AAN.AANT0._00065.root");
# execute the Python algorithm via the wrapper
d->Process("Wrapper.C+")
TSelector::GetSelector("PyTest") returns NULL due to G__ClassInfo("PyTest").IsBase("TSelector")==0 and G__ClassInfo("PyTest").New()==NULL
$ root.exe
root [0] TPython::LoadMacro("PyTest.py")
root [1] ....
root [N] TDSet *d = ...
root [M] d->Process("PyTest")
AthenaROOTAccess-00-00-38-06 in your testarea.
Please visit the simple setup or the detailed setup on how to set up the release 13.0.30 at BNL.
After the release setup, please run the following to save 3 environment variables into a file which will be used later:
touch setenv2.sh
echo export ROOTSYS=$ROOTSYS >> setenv2.sh
echo export LD_LIBRARY_PATH=$LD_LIBRARY_PATH >> setenv2.sh
echo export PYTHONPATH=$PYTHONPATH >> setenv2.sh
As of Winter 08, the ROOT version linked by ARA is 5.14, please make sure that your $ROOTSYS points to a corect root version (5.14).
This will change in Atlas release 14. We'll keep you posted.
If you are working with release 14 data (FDR2, etc) please note that recommended root version to use is root 5.18.00d. Make sure that your $ROOTSYS variable points to a correct version.
$ root -l
root [0] TPython::Exec("import PyCintex,AthenaROOTAccess.transientTree");
root [1] TFile* myFile = TFile::Open("/usatlas/workarea/yesw2000/root/Data/test-12.AOD.root");
root [2] TPython::Bind(myFile,"myFile");
root [3] TPython::LoadMacro("/usatlas/workarea/yesw2000/root/Proof/13.0.30/open-file.py");
root [4] TTree* tree_trans = gROOT->Get("CollectionTree_trans");
root [5] tree_trans->MakeSelector("mySelectAna");
The TSelector files, "mySelectAna.h" and "mySelectAna.C", will be created.
The file open-file.py in the above example reads:
global myFile
myFileName = myFile.GetName()
print myFileName
branchNames = {}
CollectionTree = ROOT.AthenaROOTAccess.TChainROOTAccess('CollectionTree')
CollectionTree.Add(myFileName)
tt = AthenaROOTAccess.transientTree.makeTree(CollectionTree,branchNames=branchNames)
print tt
TTree* m_TreeTran;=
and add members for pointer to histograms you want to create, something like:TH1F *h1_Gam_phi, *h1_mu_n, *h1_mu_m2mu;
TFile *myFile = tree->GetCurrentFile();
TPython::Bind(myFile,"myFile");
TString str_TreeTran(tree->GetName());
str_TreeTran += "_trans";
TObject* obj = gROOT->FindObject(str_TreeTran.Data());
if (obj) obj->Delete();
TPython::LoadMacro("/usatlas/workarea/yesw2000/root/Proof/13.0.30/open-file.py");
m_TreeTran = (TTree*)gDirectory->Get(str_TreeTran.Data());
Next you need edit "mySelectAna.C":
1. In SlaveBegin function, define your histograms
h1_Gam_phi = new TH1F("h1_Gam_phi","#phi(#gamma)", 100, -3.1416, 3.1416);
fOutput->Add(h1_Gam_phi);
h1_mu_n = new TH1F("h1_mu_n","N(#mu)",10,0,10);
fOutput->Add(h1_mu_n);
h1_mu_m2mu = new TH1F("h1_mu_m2mu","M(#mu#mu)",100,0,200.);
fOutput->Add(h1_mu_m2mu);
// b_PhotonAODCollection->GetEntry(entry); // b_StreamingEventInfo->GetEntry(entry); // b_MuidMuonCollection->GetEntry(entry); m_TreeTran->GetEntry(entry); ... put your analysis code and fill histograms here ...
TIter next(fOutput);
TObject *obj;
TH1F* hist;
while (obj=next()) {
hist = dynamic_cast<TH1F*>(obj);
hist->Draw();
hist->Print();
}
$ root -l
root [0] TProof::AddEnvVar("PROOF_INITCMD","source /usatlas/workarea/yesw2000/root/Proof/13.0.30/setenv2.sh");
root [1] TProof* p = TProof::Open("acas0420");
root [2] p->Exec("TPython::Exec(\"import PyCintex,AthenaROOTAccess.transientTree\")");
ROOTSYS, LD_LIBRARY_PATH and PYTHONPATH.
root [3] TChain* chain = new TChain("CollectionTree");
root [4] chain->Add("/usatlas/workarea/yesw2000/root/Data/user*.root");
root [5] chain->SetProof(kTRUE,kTRUE);
root [6] chain->Process("mySelectAna.C");
/direct/usatlas+workarea/yesw2000/root/Proof/13.0.30/Example: test2.C: the main ROOT macro run the test
setenv2.sh: setup script for proof worker nodes
open-file.py: python script to open AOD files and create transient TTree
mySelectAna.h: header file of my TSelector
mySelectAna.C: source code of my TSelector where to put your analysis code.
--destSE. Eg. pathena --destSE SLACXRD. Then output will be copied to SLAC xrootd SE via subscription.
gProof->AddDynamicPath(const char *libpath)
Please note that this site is a content mirror of the BNL USATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your BNL USATLAS account.