AnalysisQueueP1
Overview
The goal of this task is to configure sites to schedule analysis pilots with priority.
Panda can submit two types of
pilots to a local batch system. The first type is the usual production pilot while the second is a user job pilot, typically an analysis task. Both types of jobs actually use the same pilot.py code but an analysis job will provide an additional command line argument (-u) to indicate that it is an analysis pilot. Panda's submission mechanism is responsible for the rate at which each type of pilot is submitted to a computing site.
Panda's submission mechanism, based on the
siteinfo.py file, maintains separate sites for production and analysis. Analysis sites are designated by starting with "ANALY_" and can specify different features, if needed, than a related production site. The first step in supporting analysis is to define an appropriate entry in siteinfo.py and ask that the site be enabled.
The second step is to prioritize the received analysis jobs within the local batch system so that they are executed before production jobs. The exact method for doing this will vary by batch system and scheduling practices, but the common problem is allowing the batch system to identify incoming pilots as analysis jobs or production jobs. Both types of jobs will arrive through a Globus jobmanager interface that does not know the difference between analysis and production jobs. The definition of an analysis site in siteinfo.py offers a method to distinguish an analysis job.
The eighth parameter of a site definition, in siteinfo.py, allows for the inclusion of
GRAM RSL parameters when submitting the pilot through Globus. Consider a computing site that uses PBS where jobs are executed within queues and there are two execution queues defined: default_q and analy_q. Further, consider the case where the scheduling is strictly by queue and that any job in analy_q will be executed before any job in default_q. The analysis site definition in siteinfo.py can use the eighth parameter to specify that analysis pilots are submitted to analy_q. An alternative approach is to use the GRAM parameter to request different walltimes for the different job types and allow the batch system to prioritize jobs by shortest job first.
The third factor that sites will need to address is controlling the number of jobs executing within each job class. Ideally there should always be some fraction of CPU's available for immediate analysis work. This will contribute to the success of Panda based analysis by reducing the wait time experienced by users. This implies that the number of running production jobs is capped below the number of processors. How to do this will vary by batch system and enacted scheduling policies.
Requirements for Panda integration
Information needed from sites for Panda integration.
Tips, experiences from sites with PBS job schedulers
Tips, experiences from sites with Condor job schedulers
Local tests at AGLT2 show that adding a priority argument to the condor submission as shown below causes the higher priority submission to begin running more quickly than submissions with no such argument, even if submitted later. However, due to both the light load of jobs (until recently) and urgent work by both Mark Sosebee and myself, we have not yet performed this test in a Panda submission.
condor_submit <usual stuff> -append "priority = 5"
Default priority is 0, with range -20 to +20
Bob Ball - 1 August, 2007
--
RobertGardner - 22 Jun 2007