Panda project plans
- High level objectives
- Panda tasks, priorities, assignments
- Hot list: top priority tasks
- Panda server (task buffer, brokerage, dispatcher, data service)
- Analysis support (pathena, data access for analysis)
- Production pilot (analysis and managed production)
- Production job scheduler
- TestPilot pilot
- TestPilot job scheduler
- DDM (pilot data handling, site data handling, data transport, data access, file-level validation)
- Databases (database servers, services, schema)
- Facility services (servers, data storage and other facility issues at Tier 1 and Tier 2s)
- Data validation
- Prodsys interface and integration
- Monitor - operations, performance metrics, site support
- Monitor - user analysis, accounting
- Monitor - dataset browser and DDM
- Monitor - MonaLisa and Dashboard integration, migration
- Monitor - Infrastructure
- Monitor - WN interaction
- Logger
- Performance and scalability (measurement, optimization, design changes)
- Site information system
- Security
- Tools for operations and shift support
- Site support, packaging, deployment, installation, maintenance
- LCG deployment and support
- Organization and communication (email lists, code repository, ...)
- Documentation
- Completed tasks
- Open issues and things to investigate
High level objectives
Panda tasks, priorities, assignments
Task items below appear as (you can cut/paste this as a template):
| Short description (may be wiki link to a page about the item) |
Priority, status |
Assigned to |
Complete description. Be verbose, include links to relevant email and info, etc. If the task warrants a wiki page
of its own, make one and link it here
Hot list: top priority tasks
Panda server (task buffer, brokerage, dispatcher, data service)
Analysis support (pathena, data access for analysis)
| User datasets holding files processed so far |
MEDIUM |
Tadashi |
Users typically won't succeed in processing a complete dataset on the first (or second) attempt,
and anyway the dataset may not be complete when first processed. Need a user-level specification of the
subset of files that have been processed so far. Support via a user dataset containing files
processed so far. Use this dataset to implement 'process any newly available files' functionality
in pathena.
| pathena sub-file job partitioning |
MEDIUM |
Sergey |
Support partitioning of pathena jobs with a granularity smaller than the file level. Once AOD
merging is in production the AOD files will be larger than the appropriate granularity for
individual jobs. Need to partition jobs within files into event sets processed by different jobs.
| Posix PFN extraction and cataloging for DB-based tag processing |
LOW |
Glasgow? |
pathena supports file-based tag processing via a ROOT macro that scans the files for POOL refs
so they can be resolved via the LRC, translated to posix-appropriate PFNs, and recorded in an
XML catalog for Athena use. The same must be done for DB-based tags, presumably for the tag DB
as a whole, populating a large posix-access catalog (probably not an XML file) for Athena use.
| Data file marshalling for selections |
LOW |
A collection or a selection out of a collection can define (through the refs therein) a file set
of interest for marshalling and replication to an external site for processing. Need a tool to
do this marshalling: build the list of files from the refs, create a new dataset for the needed
files, and register its incomplete locations based on locality of constituent files (not necessarily
trivial) such that it can be replicated by subscription.
Production pilot (analysis and managed production)
| Pilot recovery and cleanup system |
HIGH |
Paul |
Subsequent pilots on a WN with previous job failures recover output and clean up failed job
Production job scheduler
| Switch to Tadashi's webdav area for staging scripts on web |
HIGH |
Torre |
If you upload a file to the dev server, the trf will have
wget
https://gridui02.usatlas.bnl.gov:26443/cache/***
For the production server
wget
https://gridui01.usatlas.bnl.gov:25443/cache/***
Add generic DDM plugins with OSG production instance based on DQ2ProdClient2? and LCG instance based on dq2_*
DDM (pilot data handling, site data handling, data transport, data access, file-level validation)
| Timeout on dCache in pilot |
URGENT |
Paul |
dccp hangs regularly. Right now the pilot hangs with it, ultimately dying with lost heartbeat. It shouldn't.
Protect file movement in the pilot with timeouts on a separate thread doing the transfer.
| OSG installation instructions for DDM |
URGENT |
Wensheng |
We need OSG-specific installation instructions, and corresponding installation scripts, for DQ2@OSG, OSG-specific components like site http service, MySQL? etc.
| All DDM scripts in CVS |
URGENT |
Wensheng |
All scripts used by Panda DDM, sites etc have to be maintained in CVS. Packaging procedures like pacman cache building
should then pick up the scripts from CVS, eg. in scripts/ area under Production/panda in ATLAS CVS.
Implement xrootd as supported remote file access mechanism and evaluate usage with pathena
| SRM/xrootd interface |
MEDIUM |
SLAC |
| xrootd support for file stage-out to WN disk |
MEDIUM |
SLAC |
| dcap vs. dccp to WN |
MEDIUM |
Statements at FNAL dCache workshop:
"dcap is a simple thin posix interface. Much more efficient than file transfers. Very small overhead."
Is posix I/O via dcap an improvement on dccp to WN for fully-processed files? Evaluate.
And another statement:
"srmcp cannot do more than 3-5 files at once or XML gets too big and the protocol (SOAP) too slow."
Databases (database servers, services, schema)
| Validate and deploy grid-authentication to MySQL? |
URGENT |
Wensheng, Yuri |
| Second tier archival tables |
HIGH |
Yuri, Tadashi |
Do a second migration after a few months out of the
archive table and into dedicated long term archival tables each
spanning a month (or N months), tables named for the month. This
should scale, any scripts producing history plots and such will know
where to find the data, and it'll help the monitoring to work with
a smaller archive table. Do the same for job table.
| Database schema revision 5 |
HIGH |
Yuri, Sudhamsh at CERN |
Schema changes for revision 5:
Job table:
- Add taskID mediumint(9) after jobDefinitionID, for convenience
- Add encrypt varchar(250) after transformation, to store an encryption of the transformation (and other?) fields for use in RSA key pair authentication of job record content (250 suggested because transformation is 250. Should they both be shortened?)
- Add sourceGroup varchar(32) after prodSourceLabel to record group (eg physics working group) for jobs that are run under the auspices of a group (and are accounted accordingly)
- Add VO varchar(32) after sourceGroup to record virtual organization of the job, such that non-ATLAS usage can be supported and accounted.
- Add app varchar(64) after homepackage to designate the application being run by the job, to allow brokerage across applications in a VO
- Add pilotVersion varchar(32) after pilotID to designate pilot version/type, to allow matchmaking of jobs requiring a particular pilot type/version to compatible pilots. If pilot reports a pilotversion to the dispatcher when requesting a job, the dispatcher should only allocate it a job with a matching pilotversion value.
- Change prodSourceLabel to varchar(30) rather than enum
- Remove transExitCode (and do not add transExitDiag)
- Remove xxErrorCode, xxErrorDiag, xx = sup, brokerage, jobDispatcher, taskBuffer
- Add serverErrorCode, serverErrorDiag, replacing the brokerage etc. ones
- New proposal: Change Diag error descriptions to varchar(128): long enough for useful description, and saves space
- New proposal: Change prodUserID to varchar(128): long enough, saves space
Feb 2007 addition:
- Add GridCert? column for the DN of the grid certificate of the job submitter. Needed because we now allow the submitter to specify the DN used in prodUserID for accounting purposes.
Facility services (servers, data storage and other facility issues at Tier 1 and Tier 2s)
| Migrate BNL LRC to new dedicated server |
URGENT |
Yuri, Wensheng |
| Data flow monitoring between sites, throughput plots |
HIGH |
Tier1 |
| xrootd testbed including SRM/xrootd to run xrootd tests |
MEDIUM |
Tier1? SLAC? |
| dCache deployed in production at all US Tier 2s |
MEDIUM |
Tier2s |
Agreed at the FNAL dCache workshop to do this by ~end 2006
Data validation
Prodsys interface and integration
Monitor - operations, performance metrics, site support
| Expand service health check to include LRCs |
HIGH |
| dCache space reporting via http service |
HIGH |
Torre |
From Dan:
The two sites so configured are IU_BC/IU_BANDICOOT and UC_VOB:
curl
http://bandicoot.uits.indiana.edu:8000/dq2/space/default
curl
http://tier2-05.uchicago.edu:8000/dq2/space/default
| Operations history |
HIGH |
Prem |
Operations statistics currently are only kept in a current snapshot. Implement a history table to
which snapshots are migrated to record an operations history, and implement plotting tools to
display the history.
| Database plotting system |
HIGH |
Sudhamsh |
Many Panda DBs record info we would like to have in plot form. Job tables are just one example.
Provide a generic plotting system which given a DB, SQL describing the values to be plotted, and
other plot parameters (user controllable via URL API or form), do the plot.
| Scheduler status page |
HIGH |
List and status of (non) operating schedulers
| Job duration monitoring |
HIGH |
- Plot distributions for job duration, latencies for activated and running
- Separately for production and analysis
- Alarms for anomalies
- too-long jobs
- too-long waits in assigned, waiting for activation
For assigned/waiting jobs, analyze why they're waiting, info on jobs pages and alarms to alarm system
| Job pages - subscription info |
HIGH |
Links to associated dispatch, destination subscriptions
| WN info from pilots |
MEDIUM |
Memory usage, CPU power, load from heartbeats
| Efficiency, specint resources, throughput |
MEDIUM |
Use CPU power formula for efficiency, resource, throughput metrics
Monitor - user analysis, accounting
| Provide global jobID to pathena |
HIGH |
Torre |
pathena's user job IDs are currently recorded in a local disk file. As a result, pathena runs
in different environments (BNL vs CERN) can result in duplicated jobIDs. Use the monitor's
user DB to provide http API to retrieve latest jobID -- and other info -- for pathena and
other client use.
| User stats table, job history |
MEDIUM |
Jobs run that day, success rate, where ran, datasets used,...
| User level datasets library |
MEDIUM |
| Monitor info, functions via the command line |
MEDIUM |
| Extend quota system to groups (eg PWGs) |
LOW |
Sudhamsh |
Once group level usage of Panda is in use, populate the group-level quota information and
extend the API and monitor to support group quotas.
| Email notification |
MEDIUM |
Optional email notification of job start, finish, error; selective user subscription info
Monitor - dataset browser and DDM
| Access to recent file lists for SEs |
HIGH |
eg.
http://gridui02.usatlas.bnl.gov:25880/server/pandamon/query?overview=recentfiles&site=AGLT2
| Exclude empty replicas |
HIGH |
On dataset pages do not show as replicas (or give warning) sites that hold none (under 25%) of the files.
| replica catalog extractions |
HIGH |
Torre, Alexei |
Replica catalog scanners in cron on Tier 1s to extract info in usable form for fast comprehensive replica info in monitor (dataset completeness per site)
| 'View originating dataset for this file' link in file view |
HIGH |
Torre |
| Extract complete metadata to the pickle DB for dataset browser |
HIGH |
Torre |
Dataset metadata accessible efficiently to the browser is woefully incomplete and very expensive
to access dynamically. Need to extract more to the pickle DB. Problematic because DQ2 metadata
retrieval is
so slow it is hard even with a looong cron to pull it out.
| Extract dataset completeness at sites to the pickle DB for dataset browser |
HIGH |
Torre |
Monitor needs to provide info on which files of a dataset are actually at which site, fast and
dynamically via cached info. Percent complete and so on. Fast and trivial for MySQL LRCs. We'll see for LFC LRCs.
| Move dataset listings to web cache |
MEDIUM |
Monitor - MonaLisa and Dashboard integration, migration
Monitor - Infrastructure
Alarm panel in monitor, alarm status on every page, API to register alarm, email mechanism
| Generic table spec class |
LOW |
Generalize *Spec class to accept connection specs and DB name, and dynamically build list of
attributes. Accept dictionary (tolerant against inconsistencies) mapping fields to longer descriptions.
Monitor - WN interaction
| Pilot listener, interface for 'real time' interaction |
MEDIUM |
Mechanism (high rate polling as supported for multi-tasking pilot? jabber?) for pseudo real time pilot interaction for eg. simple 'command line' (tail filename, ps, ls, debugging)
Logger
Performance and scalability (measurement, optimization, design changes)
Site information system
| Pick up pickled scheduler config, sike OK list from logger and record to site info DB |
MEDIUM |
Torre |
| Include LCG sites, at least those with Panda capability |
HIGH |
Security
| RSA key pair authentication scheme for validating job record content based on encrypted transformation field |
LATER |
Tools for operations and shift support
| Automated scanning, identification and management of duplicate bugs by signature |
HIGH |
Site support, packaging, deployment, installation, maintenance
LCG deployment and support
Organization and communication (email lists, code repository, ...)
Documentation
| Web pages to update |
URGENT |
| Needed documentation |
URGENT |
Completed tasks
| Direct posix access to remote files in pathena, supporting refs in collections and back navigation |
DONE 200608 |
Tadashi |
| User quota system |
DONE 200609 |
Sudhamsh |
Extract user-level usage information from job tables and load to the user table. Provide API
reporting user usage relative to quota for use in submit authorization once quotas are
activated. Provide monitoring interface to usage/quotas.
Open issues and things to investigate
- Use of subversion, at least for components requiring broad deployment but central management (pilot, scheduler, end user tools)
- Tool/technology laundry list: xrootd? PROOF?
- Evolution of monitoring - mix of in-house vs MonaLisa/Dashboard
- Zipping files within datasets to aggregate to larger files (supported by POOL/ROOT; files can be accessed within the zipfile container by their original guid)
Major updates:
--
TorreWenaus - 20 Sep 2006