r10 - 07 Aug 2009 - 19:05:03 - CharlesWaldmanYou are here: TWiki >  Admins Web > AnalysisQueueP2



Phase2 schedule

  • Complete Condor priority and PBS queue configuration suggestions, templates.
  • Setup of analysis queues in Panda
  • Demonstrate ability to run test analysis jobs

Steps to set up Condor analysis queues (Bob Ball)

  • First, a new entry is needed in the siteinfo.py file for the analysis queue. We modified our AGLT2 entry to arrive at the analysis version by modifying 3 fields, site_name, special_par and nodes. Nodes is no longer used, but I felt a reasonable value should still be included.
    • Change site_name from "AGLT2" to "ANALY_AGLT2", consistent with the naming of other analysis entries.
    • Change special_par from the empty field "{}" to "{'queue':"analy"}". We can then search submitted jobs for this queue name.
    • Set the nodes value to the number of job slots we'll make available, ie, from '118' to '4'.
  • Next, the file $OSG_INSTALL_DIR/globus/lib/perl/Globus/GRAM/JobManager/condor.pm must be modified to make use of the queue name and modify the condor submit file that will be generated. At current line 325 of this file, in the submit function, the code as distributed contains these lines:
    print SCRIPT_FILE "#Extra attributes specified by client\n";
    print SCRIPT_FILE "$submit_attrs_string\n";

    for (my $i = 0; $i < $description->count(); $i++) {
        if ($multi_output) {
  • Prior to the "for" loop, insert something like the following statements.
# Change submitted condor job file to establish this is a job for an analysis queue
# Two additional lines will appear, and be propagated to the Job Class Ads
    my $queue = $description->queue;
    my $setThisQ;
    my $queName;

    if ($queue eq 'analy')   {
        $setThisQ = 'True';
        $queName = 'Analysis';
    } else                   {
        $setThisQ = 'False';
        $queName = 'Default';

    print SCRIPT_FILE "+IsAnalyJob = $setThisQ\n";
    print SCRIPT_FILE "+localQue = \"$queName\"\n";

# End change

  • Finally, set up the condor_config.local file on the compute nodes to use these new Class Ad entries. We have established a single, 4-core compute node to run analysis jobs. This node is set up differently than the remainder, but all must be set up. ONLY analysis queue jobs will run on the machine set up as follows:
IsUserAnalyJob = ( TARGET.IsAnalyJob =?= True )
StartVMA = ( $(IsUserAnalyJob) )
START      = ((VirtualMachineID == 1) && ($(StartVMA)))   || \
             ((VirtualMachineID == 2) && ($(StartVMA)))   || \
             ((VirtualMachineID == 3) && ($(StartVMA)))   || \
             ((VirtualMachineID == 4) && ($(StartVMA)))

  • However, this does not preclude analysis jobs from running on other nodes. To EXCLUDE them from running elsewhere, the condor_config.local file on all other nodes should be set up in a way similar to the following:
IsNotUserAnalyJob = ( TARGET.IsAnalyJob =!= True )
StartVMA = ( $(IsNotUserAnalyJob) )
START      = ((VirtualMachineID == 1) && ($(StartVMA)) && ($(START)))   || \
             ((VirtualMachineID == 2) && ($(StartVMA)) && ($(START)))   || \
             ((VirtualMachineID == 3) && ($(StartVMA)) && ($(START)))   || \
             ((VirtualMachineID == 4) && ($(StartVMA)) && ($(START)))

  • Note that, at this time, we are not making use of the actual, defined queue name "Analysis" as determined by the passed "special_par" value and passed in the localQue setting. We may do so in the future though, and its existence does not cause any harm.
  • My thanks to Xin Zhao, Mark Sosebee and Alex Withers for their help in setting this up.

Simplified Condor setup (Charles Waldman, UC/MWT2)

  • Note: MWT2 analyis queue has been relocated to a different cluster running PBS, the following is obsolete
  • I basically followed Bob's instructions above, with some slight modifications and simplifications.
  • The part about editing siteinfo.py is unchanged
  • I have renamed some variables and simplified some logic for clarity
  • The file $OSG_INSTALL_DIR/globus/lib/perl/Globus/GRAM/JobManager/condor.pm must be modified as follows. At around line 300, find the section
that reads:

    print SCRIPT_FILE "#Extra attributes specified by client\n";
    print SCRIPT_FILE "$submit_attrs_string\n";

    for (my $i = 0; $i < $description->count(); $i++) {
        if ($multi_output) {
  • Prior to the "for" loop, insert the following statements.
# Addition to support analyis queue
    my $queue = $description->queue;
    my $isAnalyJob;
    my $queName;

    if ($queue eq 'analy')   {
        $isAnalyJob = 'True';
        $queName = 'Analysis';
    } else                   {
        $isAnalyJob = 'False';
        $queName = 'Default';

    print SCRIPT_FILE "+IsAnalyJob = $isAnalyJob\n";
    print SCRIPT_FILE "+localQue = \"$queName\"\n";

# End change

  • We have also chosen to let analysis jobs run on the rest of the compute nodes, along with production jobs. Therefore, on the non-dedicated hosts, the condor_config.local file is unmodified.
  • On the dedicated analysis hosts, I have added these two lines to condor_config.local. This is somewhat simpler than the version above, but logically equivalent:
IsUserAnalyJob = ( TARGET.IsAnalyJob =?= True )
START = ( $(IsUserAnalyJob) )
  • My thanks to Bob Ball for giving me a good example to follow.

One option for setting up an analysis queue in PBS (Sosebee / Mcguigan, UTA)

1) Add a new entry, named "ANALY_ site-name," to the siteinfo.py file on atlas002.uta.edu (submit host), by requesting an update to the CVS version (Marco). Examples of this can be found in the current version. (See below for detailed information regarding the contents of siteinfo.py from Marco.)

The entry for ANALY_UTA-DPCC is given by:

"ANALY_UTA-DPCC":['atlas.dpcc.uta.edu','pbs','/data73/grid3-1.1.11/apps/','/data73/grid3-1.1.11/data', '/data73/grid3-1.1.11/tmp','/scratch','http://osg-itb2.dpcc.uta.edu:8000/dq2/', {'queue':"analy_atlas_q",'maxWallTime':"2500"},' /data73/grid3-1.1.11/apps/atlas_app/python/python-2.4.1/Python-2.4.1/python',2,'NOTOK', None, [], [], [['8.0.1'], ['11.0.2'], ['10.0.1'], ['10.0.4'], ['8.0.4'], ['8.0.5'], ['11.0.4'], ['12.0.3', '1', '2'], ['12.0.31', '1', '2', '3', '4', '5', '6', '7', '8'], ['11.0.3'], ['11.0.42'], ['9.0.1'], ['9.0.2'], ['9.0.3'], ['11.3.0'], ['11.0.1'], ['11.0.5'], ['11.5.0'], ['12.0.0', '3'], ['12.0.1', '1', '2', '3'], ['12.0.2', '1', '2', '3'], ['12.3.0', '1'], ['12.0.4', '1', '2'], ['12.0.5', '1', '2', '3'], ['12.0.6', '1', '2', '3', '4', '5']]]

2) Request this new site be added in the panda server (Tadashi).

3) Create a second PBS queue, distinct from the existing one used for production jobs. The new queue name is one of the parameters in the site specification in 1).

4) Modify $OSG_LOCATION/globus/share/globus_gram_job_manager/pbs.rvf so that globus knows about the new queue.

5) Assuming N total cpu's (cores) are available, cap the production queue at N-n total jobs, where n is the number of cpu's to be dedicated for analysis jobs.

6) With this configuration, n job slots are always available for incoming analysis jobs.

7) The new analysis site can be tested by submitting a job via pathena, with the option "--site ANALY_ site-name."

Note: One possible modification to this scheme would be to allow analysis jobs to run on idle cpu's that could be available in the event that the number of production jobs is less than N-n.

siteinfo.py is currently holding most of the information about the Computing Elements useful for ATLAS jobs

It is a dictionary and each item contains an array with information about a Computing Element. There are 2 special items: SiteName? : with the name of the 'rows' (the information provided) default: with default values

The file is available in the Panda packages and CVS: http://atlas-sw.cern.ch/cgi-bin/viewcvs-atlas.cgi/offline/Production/panda/jobscheduler/siteinfo.py?rev=HEAD&content-type=text/vnd.viewcvs-markup

Some interesting recent addition: - atlasrel : ATLAS releases installed at the CE - copytools : same as the info in 'storage_access_info.py' (that could be generated) - sitepar : possibility to have CE specific job parameters

A full list of the information currently provided is: "SiteName":['gatekeeper', 'queue', 'osg_app', 'osg_data', 'tmpdir', 'osg_wntmp', 'DQ2URL', 'special_par', 'python_path', ' nodes', 'NOTOK', 'osg_grid', 'hostnames', 'copytools', 'atlasrel', 'sitepar'],

Below is the docstring of siteinfo.py that includes a description of all fields:

SiteName? - OSG CE name, AAA_N (N integer) is used to have multiple entries for the site AAA, AAA_ANALYSIS CE reserved for analysis

gatekeeper - fqdn of the gatekeeper host

queue - batch queue

osg_app, osg_data, tmpdir, osg_wntmp - OSG SiteStorages?

DQ2URL? - URL of the dq2 http interface for the DQ2 server used for (bound to) that CE

special_par - additional Globus parameter, as dictionary (e.g. queue, maxWallTime)

python_path - if present, used instead of the system python

nodes - number of CPUs (used for WRR scheduling), unreliable

NOTOK - flag OK/anything_else (e.g.NOTOK) used by production pusher to choose CEs

osg_grid - OSG SiteStorage? (where Grid software is available)

hostnames - 3 components array 'SE', array of beginnings, array of endings (e.g. ['SE', ['tier2', 'compute-'], ['.uchicago.edu', 'local']])

copytools - array ['copytool', 'setup file if any'], e.g. ['dccp','']

atlasrel - array containing installed ATLAS releases formatted as arrays: ATLAS release first, all the trf versions after (e.g. [['11.0.5', '1', '2'], ['12.0.31', '1'], ['12.0.6', '1', '2', '6']]

sitepar - site parameters per job; these override default job parameter {'pilotname':{'p1name':p1val, 'p2name':p2val, ...}, 'pilotname2':{'pXname':pXval}}

-- RobertGardner - 28 Aug 2007

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback