r3 - 22 Oct 2008 - 14:18:24 - RobertGardnerYou are here: TWiki >  Admins Web > MinutesOct22



Minutes of the Facilities Integration Program meeting, Oct 22, 2008
  • Previous meetings and background : IntegrationProgram
  • Coordinates: Wednesdays, 1:00pm Eastern
    • new phone (309) 946-5300, Access code: 735188; Dial *6 to mute/un-mute.


  • Meeting attendees: Rob, Charles, Marco, Rich, Kaushik, Mark, Wei, Sarah, Torre, Saul, Armen, Karthik, Wensheng, Bob
  • Apologies: Patrick, Horst, Michael
  • Guests: none

Integration program update (Rob, Michael)

  • IntegrationPhase7 under construction
  • IntegrationPhase6 final updates including SiteCertificationP6 and CapacitySummary (figures effective Sep 30) led-red
  • Quarterly reports - this is needed for the agency reporting. led-red
  • High level goals in Integration Phase 7 (from BNL workshop):
    • Pilot integration with space tokens
    • LFC deployed and commissioned: DDM, Panda-Mover, Panda fully integrated
    • Transition to /atlas/Role=Production proxy for production
    • Storage
      • Procurements - keep to schedule
      • Space management and replication
    • Network and Throughput
      • Monitoring infrastructure and new gridftp server deployed
      • Throughput targets reached
    • Analysis
      • New benchmarks for analysis jobs coming from Nurcan
      • Upcoming Jamborees
    • Probably will hold another US ATLAS Tier 2/Tier3 meeting
      • Winter/Early Spring
      • Location: Duke (TBC)
  • OSG site admins meeting coming up: https://twiki.grid.iu.edu/bin/view/SiteCoordination/SiteAdminsWorkshop2008 led-blue

Next procurements

  • Cover in site-reports

Operations overview: Production (Kaushik)

  • Got some validation tasks over the weekend.
  • Issue - there was a problem with release matching in Panda; disabled for now - all releases at all sites in the OSG.
  • DATADISK - mistakenly used instead of PRODDISK. This is a mistake (except for Tier 1).

Shifters report

PRODDISK migration (Yuri)

  • Follow-up w/ status at each site
  • AGLT2 - done DONE
  • MWT2_UC, UC_ATLAS_MWT2 - done DONE
  • MWT2_IU - done DONE; but not yet IU_OSG
  • SWT2_CPB - done DONE
    • last time, update: How to best use xrootd's internal mover to avoid SRM for transfers to/from the SE from compute nodes. Prefer to do both for read/write. Waiting on Paul for a pilot change. Right now using SRM server.
  • SWT2_OU - need to install Bestman
  • SLAC
    • ready - would also like to use the internal xrootd mover. There is also a pilot problem that needs to be fixed. In email with Paul DONE.
  • BU
    • Bestman - Posix w/ gpfs. Shouldn't be a problem, updates? Ready. Just change ToA and pilotcontroller.
  • Yuri notes that all configuration changes will now be done through pilotcontroller.py, in SVN, and all changes should go through Paul, Torre, Tadashi, and information there should be the same as in ToA.

Analysis queues, FDR analysis (Nurcan)

  • Background, https://twiki.cern.ch/twiki/bin/view/Atlas/PathenaAnalysisQueuesUScloud
  • Stress testing program
    • First phase - will submit 100 TAG-selection jobs to the sites. They read AODs from the site's SE.
    • Have sent single jobs to each site yesterday. SLAC having a problem with pilots retrieving the jobs. MWT2_UC - migration to LFC. NET2 - fine; OU - fine; BNL - fine; AGLT2 - dCache problem yesterday; SWT2 - issue of setting up the space token. Issue of direct reading of AODs. Have not enabled the xrootd system to translate to root URLs. Will require a pilot code modification.
    • Second phase - repeat 10K job submission as was done in the Jamboree.
  • BNL, NE, OU, SLAC - testing okay.
  • Other sites
    • waiting on LFC migration. MWT2 - write pool configuration problem
  • VOMS server at BNL was not up to date, now solved.
  • Subdirectory ACL can be set for the proxy. - Marco
  • How can users delete their datasets? Currently not possible.
  • Problem: user analysis jobs requesting files on tapes - ESDs/RDOs. New job state - "Partial" - if a subset of the jobs were already staged from tape and could be run successfully. Mainly an issue with analysis jobs at the Tier 1.
  • Interim solution is to have users submit subscriptions - for raw data.

Operations: DDM (Hiro)

  • Many recent changes in ToA to include actual path - allowing DQ2 to copy to multiple destinations.
  • BNLDCACHE - all okay. >10K files/hour, > GB/sec.
  • Will start throughput tests -

LFC migration

  • SubCommitteeLFC, see meeting notes LFCMeetOct22
  • SLAC, UTA both waiting on xrootd site mover tools for PRODDISK. Once done they'll migrate.

Long URL's in the catalog (Marco)

  • Our convention is to use full URL's in the catalog in the US.
  • There are few changes implied for the pilot - Paul is aware.
  • What about dq2-put? Is the short URL used? Need to check w/ Mario.

Site news and issues (all sites)

  • T1: network maintenance yesterday, and NFS maintenance.
  • AGLT2: dcache problems seem to have cleared. Everything else seem to be working. 1/2 of the equipment Dell delivered. Start installing next week.
  • NET2: BU site running smoothly. Harvard having problems - will follow-up offline. Storage component ordered.
  • MWT2: LFC migrated at both sites. Most equipment received at IU.
  • SWT2 (UTA): Discussions w/ Dell. Otherwise all else is well. There were problems w/ Gratia reporting - working with Chris Green at fermilab. Now resolved.
  • SWT2 (OU): All okay. Equipment - waiting on quotes from Dell.
  • WT2: ready to migrate to LFC. Doing some cleaning - cleanse.py.

Carryover issues (any updates?)

Release installation via Pacballs + DMM (Xin, Fred)

  • status from last week
    • Fred - tracking down why data is not being subscribed to BNL. Pacballs are being created, but the subscriptions aren't working to BNL automatically. Consulting Alexei.
    • Xin - Tadashi has converted scripts into install log payload, and created job creation interface. Next step will try some test sites in Panda, temporary area, and then test jobs using usatlas2 role.
  • All pacballs now being transferred automatically to BNL (will test with 14.2.24 release)
  • Pilot - usatlas2 role - how to save the results to the output SE.
  • There is a question about where to send the production installation job logs.

Throughput initiative - status (Shawn)

  • No meeting this past week.


  • There is a separate subcommittee formed to redefine the whitepaper (Oct 1). Placeholder to follow developments.


  • None.

-- RobertGardner - 21 Oct 2008

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback