r3 - 08 Apr 2009 - 14:28:23 - RobertGardnerYou are here: TWiki >  Admins Web > MinutesApr8



Minutes of the Facilities Integration Program meeting, April 8, 2009
  • Previous meetings and background : IntegrationProgram
  • Coordinates: Wednesdays, 1:00pm Eastern
    • (309) 946-5300, Access code: 735188; Dial *6 to mute/un-mute.


  • Meeting attendees: Pedro, Douglas, Michael, Charles, Rob, Saul, Fred, John, Sarah, Horst, Karthik, Patrick, Wei, Rich, Tom, Kaushik, Nurcan, Mark, Armen
  • Apologies: Xin

Integration program update (Rob, Michael)

  • IntegrationPhase8
  • Special meetings
    • Tuesday bi-weekly (9:30am CDT): Facility working group on analysis queue performance: FacilityWGAP
    • Tuesday (12 noon CDT) : Data management
    • Tuesday (2pm CDT): Throughput meetings
    • Friday (1pm CDT): Frontier/Squid

Certifying sites for analysis stress tests

  • We need to certify sites to handle each type of analysis job, in advance of the May 28 stress test. (See also Nurcan's Analysis Queues below.)
  • Developing a checklist for site readiness - see: AnalyQueueCertificationPrep
  • Will relocate the table below to the FacilityWGAP pages.
  • Need someone to liaison with Hammer Cloud folks (gather job type details, schedule for testing, diagnosing failures).

Notation: led-green completed led-blue work is in progress led-gray defer to next phase led-red table to be updated

Job type Athena release input T1 AGLT2 MWT2 NET2 SWT2 WT2
SusyValidation 14.5.0   led-blue led-blue led-blue led-blue led-blue led-blue
D3PD making with TopPhysTools 14.5.0              
TAG selection 14.5.0              
AthenaRootAccess 14.5.0              
Data reprocessing              
ANLASC1 14.5.1 AOD     led-blue      
ANLASC2 14.2.23 ESD     led-blue      
ANLASC2 Jet sampling 14.5.1 ESD     led-blue      
More jobs types to come ...                

Other remarks

  • We need to make the analysis stress test readiness schedule deterministic
  • Quarterly reports due.

Operations overview: Production (Kaushik)

  • Reference:
  • last meeting(s):
    • Probable now is 90% of the jobs are pileup-up jobs. I/O intensive (20 input files, 1 our jobs). Hits files are often on tape, as they were generated months ago. Hitting HPSS hard. Thats why we're not running any jobs at the moment.
    • Some simulation channels, but priority is for pileup.
    • Have asked for regional simulation tasks for backfill - but thats hung-up in physics coordination and ATLAS-wide.
    • We have no control for regional production tasks.
    • Next task will be large-scale reprocessing, using release Files must come tape, as an exercise. Expect this to begin tomorrow. Expect only Tier 1 resources to be sufficient, though will add SLAC as a Tier 2 to augment.
  • this week:
    • Reprocessing jobs (200K jobs, only getting few K per day) & pileup - both need files from tape
    • Lots of requests from users for files from tape as well
    • Options? Merging to large files. Process data while its on disk quickly. Increase disk or tape capacity?
    • Michael: Note - we don't have enough jobs in a state that would allow optimizing the I/O - can't do "near future" scheduling.
    • Some concern about ATLAS policy regarding requests for raw data (10% on disk).
    • Idle CPUs at Tier 2's. We have regional requests from US users, but we can't get them approved through physics coordination. But there are problems getting tasks schedule for the US cloud by Panda - jobs getting blocked. Panda team working on this by classifying jobs, and allowing all types to flow. However, the US has now assignments for evgen or simul. There has to be some care in the priority assignments - so that reprocessing maintains its priority over simulation, eg.
    • Reprocessing issues:
    • Increase input queue to 6000. PNFS load shouldn't be a problem - Pedro.
    • Fraction of jobs with transformation errors - US cloud getting more than its fair share. Cosmic stream reprocessing tasks - we got 80% for the US. 10K job failures. 2000 real jobs - but skimming jobs are getting flagged as an error. Pavel is allowing the jobs to fail 5 times before re-defining the transformation.
    • 62K reprocessing jobs to do.

Shifters report (Mark)

Analysis queues (Nurcan)

  • Reference:
  • last meeting:
    • See AnalysisQueueJobTests, from FacilityWGAP
    • Ran 19K analysis jobs on various sites two weeks ago. Will collect some statistics. Alden will create some plots.
    • There have been some problems with, lfc-mkdir failure.
    • Will start sending 1000 jobs daily. Expect Hammer Cloud jobs to begin soon in the US. Discussions with Dan to use timings from pilot logs - they're working on this.
  • this meeting:
    • Meeting this week FacilityWGAPMinutesApr7
    • Started daily stress-tests on Monday.
    • Two key issues so far: LFC registration failures ("file exists"). Same library input dataset names being used - pathena bug- fixed. Second - AGLT2 srm authentication failure, gums. Shawn investigating. Resending to AGLT2.
    • No proper monitoring yet - checking manually. Torre, Aaron being consulted for a stress-test monitor.
    • Will run some queries against Panda DB for failures - will summarize into metrics.
    • Need to make sure input datasets are present at Tier 2s.
    • Will solicit users for job types and plans. Will measure success/failure rates - triage most critical failures. Need job timing information - to help sites optimize performance. Some of these figures have already been measured - need to collect and plot.

DDM Operations (Hiro)

Data Management & Storage Validation (Kaushik)

  • Reference
  • last week(s):
    • US decision for ATLASSCRATCHDISK needed (ref here)
    • Space clean-up at Tier 2. What about MCDISK and DATADISK. AGLT2, SWT2 have already run into the problem. Big mess!
    • Hiro has installed Adler 32 plugin for dq2 site services at BNL. Checks value dcache and dq2. Catches errors for corruption during transfer. Running in passive mode - no corrupted files in a week. Active mode will fail the transfer if there's a mismatch.
    • Another big issue is when BNL migrates to storage tokens.
    • Pedro, Hiro working on services to reduce load on pnfs servers. Using pnfs IDs rather than filenames. Call backs from dcache when a file is staged, rather than polling.
    • Alexei's group has developed a nice way to categorize file usage at each site. There's a webpage prototype.
    • ATLASSCRATCH deadline?
  • this week:
    • MinutesDataManageApr7
    • Wei - MCDISK - full - are there any activities to delete old data? Will ask Stephane to delete obsolete datasets.
    • Need some dataset deletions.

Throughput Initiative (Shawn)

  • Notes from meeting this week:
               NOTES from USATLAS Throughput Meeting
April 7th, 2009
Attending:  Shawn,  Sarah, Rich, Jeff,  Joe, Jay, Horst, Karthik, Rob, Doug, Neng, Mark, Saul (via email!)
Discussion about perfSONAR.   Use in USATLAS so far has been helpful.    Problems have been identified and fixed based upon perfSONAR measurements.    Current perfSONAR has a number of small issues that are being addressed by the developers for the next release.
Discussed appropriateness of broad-scale perfSONAR deployment for Tier-3 sites.  Two views:  limited Tier-3 manpower may not be able to deploy/maintain such boxes vs. alternate view: having limited manpower increases the rationale for having a standardized network diagnostic instance at the Tier-3 site.  Everyone seemed to agree that having at least 1 perfSONAR box deployed at Tier-3 makes sense ONCE the installation is “turn-key” and extremely robust.   On-demand tests as well as regular scheduled tests with “preferred” Tier-2 and “local” Tier-3’s may be very useful.
RECOMMENDATION: Tier-3 sites should plan to deploy one of the “standard” perfSONAR boxes by Fall 2009 (after installation is well documented and software is very resilient).   This is important in that many Tier-3  sites may be seeking stimulus funding and these boxes should be part of their requests.
Much discussion about Jay’s graphics.   There is a desire to see something similar as part of the “homepage” for the next perfSONAR install.  Jeff Boote mentioned there is an interest in having something like this for the next release but not much people time available to pursue it.  Jay will continue to work on it and hopefully what he provides can be a good basis for such a future addition.   Discussed other improvements based upon feedback received to date.   Filtering/selection will be useful for created “views” that users find helpful.
Rest of the notes in-line below.
From: usatlas-grid-l-bounces@lists.bnl.gov [mailto:usatlas-grid-l-bounces@lists.bnl.gov] On Behalf Of McKee, Shawn
Sent: Tuesday, April 07, 2009 10:16 AM
To: Usatlas-Grid-L; Joe Metzger; O' Connor, Michael P (ESNET); big@bnl.gov; azher@hep.caltech.edu
Subject: [Usatlas-grid-l] Throughput Meeting Today April 7th, 2009 at 3PM Eastern
Hi Everyone,
We resume our Throughput meeting today, Tuesday April 7th, 2009 at 3 PM Eastern time.
The ESnet call-in number:
 ES net phone number:
 Call: 510-665-5437
 *Dial up number does not apply to Data Only ( T-120) Conferencing
 When: April 7th, 2009, 03:00 PM America/Detroit Meeting ID: 1234
The agenda is:
1)      perfSONAR status and related issues
a.      Current use within USATLAS
b.      Network issues discovered by perfSONAR: status/resolution
c.       Recommendations for Tier-3 deployment?  General discussion about broader deployment
d.      Next version (including Jay’s front-end graphics?)
2)      Throughput testing  (Postponed  till next week when Hiro is back)
a.      Need to complete milestones – reschedule based upon network issue resolution(s)
3)      Site reports
a.      Wisconsin --- cfengine changed from 10GE to 1GE config “automatically” (mis-configured).  Fixed now.
b.      WT2/SLAC --- No report
c.       SWT2/OU/UTA --- Horst reported 1 gig perfSONAR tests improved.    Mark reported on UTA status…plans for throughput test repeats. 
d.      NET2/Harvard  ---  perfSONAR nodes up now (need to verify they are running though…not showing up in the lookup service right now).  A new peering with BNL via the ESnet dedicated circuit is scheduled for Friday startup.  In the market for replacement 10GE interfaces for  the existing Neterion NICs.
e.      MWT2/IU/UC --- Sarah reported that both perfSONAR boxes are up.   Testing with UC from IU.   Rob reported on MTU issues at UC.
f.        AGLT2  ---  Shawn  reported on network problem resolution at AGLT2.  With Azher’s help, found a strange Cisco problem which caused outbound packets (IPV4) to have to be switched via the supervisor CPU.  Resulted in a factor 1/10 the inbound rate.   Fixed by IOS upgrade and reboot of switch.   Ready for new throughput tests now.
g.      BNL ---   No updates
4)      AOB  ---  Joe Metzger reported that if Tier-2’s want to do any testing ESnet is willing to help out.  They have over 25 10GE enabled test locations available.
Plan to meet at the usually time next week.
Let me know if there are other topics we should add to the agenda.
Send along any corrections or additions via email.

  • last week:
    • There will be a meeting next week. We're a little behind on throughput milestones - some probs with the perfsonar boxes observed.
    • Esnet working w/ BNL to resolve issues; also Ultralight.
    • Hiro will send a small number of large files to each site. Will plot throughput. Regular.
    • See also Jay's page for perfonar monitoring.
  • this week:
    • Perfsonar for Tier 3 sites - good idea, once we have a turn-key solution. Primary purpose would be as a test point for that site. Also - Tier 3's can test with "partner" Tier 2's.

Squids and Frontier (John DeStefano)

  • Note: this activity involves a number of people from both Tier 1 and Tier 2 facilities. I've put John down to regularly report on updates as he currently chairs the Friday Frontier-Squid meeting, though we expect contributions from Douglas, Shawn, John - others at Tier 2's, and developments are made. - rwg
  • last meeting(s):
    • Dantong reporting. There is a weekly Frontier meeting chaired by John Stefano.
    • Friday afernoon 1pm Eastern.
1) Get the documentation on TWiki in a week. 1) We will finalize BNL Frontier Infrastructure (two-three weeks from now).  (Time line April/15/2009, tax day)
 Two instances of Frontier services behind F5 switch.   Attached please see Dave's suggestion on testing Frontier functionality.

2) Tier 2 centers will identify their servers for local Squid, and study the documentation we provided.
During the operation meeting, let us discuss about what hardware and software requirements for Tier 2 configuration.

3) Tier 2 will set up their infrastructure one to two weeks after we finalize ours (April/30).
    • John - recommendations for Tier 2s. AGLT2 connection to BNL via Squids working.
    • Documentation setup.
    • Sites need to start identifying hardware. Singled threaded - only 1 CPU required. 2G Ram, at least 400 GB disk.
    • John will send regular announcements for the Friday meeting.
  • this week:
    • Sites need to identify hardware. Help available for setting up test beds.
    • Meeting this Friday.
    • Established a load balancer in front of the BNL Frontier server.

Site news and issues (all sites)

  • T1:
    • last week: Storage capacity - petabyte of Sun storage. Needed right 10G cards - only the Myrinet cards worked properly in the Thors. All together now. 25 units given to Pedro's group to get dCache configured there. Lots of activity in storage and data management. Working on priority of staging requests being implemented. Networking - next round of uslhcnet upgrades under way - for transatlantic capacity. 20 Gbps by October, contingent on final budget. Improvement T1-T2 connectivity. Now progress w/ BU and BNL circuit. Next would be AGLT2 connected.
    • this week: Have two additional 10G Esnet circuits in place. Can now establish dedicated circuits w/ all Tier 2s. Expect a second 10G link between BNL and CERN, that will need an additional Esnet circuit. Additional storage deployment - Armen getting requirements for additional space tokens. Note - there are new numbers for requested resources, 20% lower than October 2008 RRB. Deployment schedule granularity now given quarterly. Discussed at WLCG GDB meeting yesterday. Will look at numbers and schedule next week. Tier 2 numbers probably unchanged.

  • AGLT2:
    • last week: MCDISK getting full. Putting together scripts and consistency checking. Wenjing has a hot-replica script service for dCache. Some nodes at MSU off for AC work.
    • this week: GUMS issues - attempted upgrade is not working. Currently have GK issues as a result. Problem was that GUMS getting read timeouts under heavy load (causing authorization failures).

  • NET2:
    • last week(s): BU (Saul): 224 new harpertown cores have arrived, to be installed. New storage not yet online - HW problems w/ DS3000s (IBM working on it). HU (John): gatekeeper load problems of last week related to polling old jobs. Fixed by stopping server, removed old state files. Also looking into Frontier. Frontier evaluation meeting on Friday's at 1pm EST run by Dantong (new mailing list BNL). Fred notes BDII needs to be configured at HU. Communicating with Hiro via email for Adler issue. HU - there was a bug in NFS from RH5 kernel causing lots of problems (gatekeeper loads, slow pacman installs). Kernel 2.6.18-53. Replaced with 2.6.18-92. Order of magnitude more more lookups, especially when lots of modules in ld library path. Expect to bring back online soon. Data corruption issue turned out to be a hardware problem. Doing a complete inventory of data. ~few K files corrupted. New BM installed. Perfsonar boxes up - one working already. New rack of GPFS storage. 224 cores to be added. Progress in networking; NOX decided for direct connection between it and Esnet. HU (John): no pilots to the site - working with shifters.
    • this week: HU site is up and working well. 128 cores added at BU. 130 TB into production. Perfsonar machines up and working. New dedicated circuit to BNL - this Friday. Continuing cleanup operation from corrupted files (3K files).

  • MWT2:
    • last week(s): 21 new compute new compute servers (PE1950), 52 TB of storage to be added. Looking into latency issues with xrootd. Getting some strange behavior with xrootFS. (Had a new data server on the bestman server.) In communication to Wei. Problems with network cards in the new Dells - dropped packets. Need to contact Myricom. Analysis queue stress test working. At IU - large number of jobs in transferring - a problem in the pilot. There were some releases missing.
    • this week:
      • Long-standing dcache instability probably due to dropped packets in the network. Progress on network configuration at UC (many thanks to Shawn!). Found possible source of packet loss - MTU mismatch on the private VLAN. Reconfigured Dell, Cisco switches yesterday, 10G NICs. No packet loss, ethernet NIC errors, or giants reported in the switch. Studies continuing today. 21 compute nodes hopefully online this week.

  • SWT2 (UTA):
    • last week: Space problems on CPB cluster. Ran proddisk-cleanse. Working on cleaning up old data - what can be deleted. Some files on disk are unknown to LFC. SWT2_UTA was not getting pilots from BNL; cleaned up grid monitor debris, back online.
    • this week: all is well.

  • SWT2 (OU):
    • last week: All is well. 100 TB storage ordered.
    • this week: all is well.

  • WT2:
    • last week: all is well. Replaced bad harddrive in a thumper. Running clean-up script in proddisk. When will the central operations team begin regular cleanup. Agreement is they will delete test data. User data we manage ourselves (US decision).
    • this week: srm died this morning, otherwise all okay. April 16 power outtage.

Carryover issues (any updates?)

Release installation, validation (Xin)

The issue of validating presence, completeness of releases on sites.
  • last meeting
    • The new system is in production.
    • Discussion to add pacball creation into official release procedure; waiting for this for 15.0.0 - not ready yet. Issue is getting pacballs created quickly.
    • Trying to get the procedures standardized so it can be done by the production team. Fred will try to get Stan Thompson to do this.
    • Testing release installation publication against the development portal. Will move to the production portal next week.
    • Future: define a job that compares whats at a site with what is in the portal.
    • Tier 3 sites - this is difficult for Panda - the site needs to have a production queue. Probably need a new procedure.
    • Question; how are production caches installed in releases? Its in its own pacball, can be installed in the directory of the release that its patching. Should Xin be a member of the SIT? Fred will discuss next week.
    • Xin will develop a plan and present in 3 weeks.
  • this meeting:

Tier 3 coordination plans (Doug, Jim C)

  • last report:
    • Doug would like to report bi-weekly.
    • Would like to consider Tier 2 - Tier 3 affinities - especially with regard to distributing datasets.
    • Writing up a twiki for Tier 3 configuration expectations
    • Will be polling Tier 3's for their expertise.
    • Tier 3 meeting at Argonne, mid-May, for Tier 3 site admins.
    • Should Tier 3's have perfsonar boxes. Question is timeframe for deployment. To be discussed at the throughput call.
  • this report:

HTTP interface to LFC (Charles)

VDT Bestman, Bestman-Xrootd

  • See BestMan page for more instructions & references
  • last week
    • Have discussed adding Adler32 checksum to xrootd. Alex developing something to calculate this on the fly. Expects to release this very soon. Want to supply this to the gridftp server.
    • Need to communicate w/ CERN regarding how this will work with FTS.
  • this week

Tier3 networking (Rich)

  • last week
    • Reminder to advise campus infrastructure: Internet2 member meeting, April 27-29, in DC
    • http://events.internet2.edu/2009/spring-mm/index.html
    • Engage with the CIOs and program managers
    • Session 2:30-3:30 on Monday, 27-29 to focus on Tier 3 issues
    • Another session added for Wednesday, 2-4 pm.
  • this week

Local Site Mover


  • None.

-- RobertGardner - 07 Apr 2009

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback