r5 - 20 Aug 2008 - 08:46:25 - RobertGardnerYou are here: TWiki >  Admins Web > MinutesJune25



Minutes of the Facilities Integration Program meeting, June 25, 2008
  • Previous meetings and background : IntegrationProgram
  • Coordinates: Wednesdays, 1:00pm Eastern
    • Phone: (605) 475-6000, Access code: 735188; Dial *6 to mute/un-mute.


  • Meeting attendees:
  • Apologies: Rob, Fred
  • Guests: none

Integration program update (Rob, Michael)

  • IntegrationProgram for Phase 5 (April 1 - June 30, 2008: FY08Q3)
  • Overarching near term goals for Phase 5:
    • Full and effective participation FDR-2 exercises
    • Complete the benchmarks of 200 MB/s sustained disk-to-disk throughput to all Tier2s
    • SRM v2.2 functionality for all ATLAS sites
    • SAM availability reporting to WLCG (May)
  • Upcoming meetings:
  • Milestones from the Ann Arbor meeting: AnnArborNotesMay2008:
    • FDR2: data replication and analysis queues
    • 200/400 MB/s T1-T2
    • OSG 1.0 deployed
    • LFC evaluation and deployment strategy complete
    • WLCG - SAM/RSV, reliability availability metrics for CE and SE reporting >80% for all sites.
    • Provisioning of capacities according to pledges on track for September 15 2008 deployment.
    • Network performance monitoring infrastructure deployed.
    • Revision to the Tier 3 white paper, and a reference Tier 3 facility defined.
    • Analysis benchmarks demonstrated at increasing scale (100/200/500/1000 simultaneous jobs) at all Tier 2 facilities.

Site certification review

Operations overview: Production (Kaushik)

Shifts (Marco)

Next procurements

  • Standing agenda item, see CapacitySummary.
  • Follow-up issues:
    • Storage capacity recommendations/guidance for the Facility (320 TB capacity, from Kaushik's model on MinutesJune11).
    • Revised WLCG pledges
  • Specifications from Internet2 for network monitoring hosts (Rich)

User LRC deletion (Charles)

  • Nurcan reports this is currently failing - Charles addressing this.

Analysis queues, FDR analysis (Mark)

  • Follow-up:
    • Regular exercising of analysis queues over data sets, especially when there are new releases. Nurcan is doing this.
    • Mark and Nurcan will meet to discuss some systematic testing at the sites and will re-consider the analysis benchmarks.
    • Metric - define a standard for time required to process a standard dataset
    • Consider site availability monitor which indicates basic functionality indicating site-readiness; this would help users distinguish "site" problems for "user-code" problems.

Operations: DDM (Hiro)

LFC migration (John)

RSV SE & CE probe update status (Fred)

  • Follow-up from last week:
    • SRM probes needed for AGLT2, SWT2, NET2
    • AGLT2 - has 2.0 probes, just not enabled. Will run configure.
    • BU - has RSV 2.0 running, but not reporting. Saul will follow-up, will install OSG 1.0 by next week.
    • SW - need SRM probes. Did upgrade, but may not have enabled SRM probe.
    • BNL - why not reporting? Xin claims its reporting fine locally. Are they going into Gratia correctly? Fred will follow-up with Xin.
  • See https://twiki.grid.iu.edu/twiki/bin/view/Operations/RsvSAMGridView for links to SAM and Gridview reporting consoles.

Scheduling maintenance downtimes with the GOC (Sarah)

WLCG accounting

OSG 1.0 (Rob)

  • OSG 1.0 now released
  • Completed at UC_ATLAS_MWT2, AGLT2
  • See https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/WebHome
  • Follow-up from upgrade schedule of last week:
    • AGLT2 - complete
    • NET2 - within the next week
    • UTA_SWT2 - perhaps next week or earlier
    • SWT2_CPB - probably 2 weeks away since there will be a major shutdown
    • SLAC - first needs to do xrootd storage upgrade; earliest end of this month
    • MWT2 - within the next week

Site news and issues (all sites)

  • T1:
  • AGLT2:
  • NET2:
  • MWT2:
  • SWT2 (UTA):
  • SWT2 (OU):
  • WT2:

Carryover issues (any updates?)

Pilot upgrade for space tokens (Kaushik (Paul))

  • A bit of development to do. Carry-over
  • No results yet from tests at AGLT2.

Release installation via Pacballs (Xin)

  • Follow-up
    • Progress - this morning to discuss this. Fred - hoping this week to have first set of pacballs installed in DQ2. Will test with some older releases on some test machines.
    • Need official naming scheme.
    • Get installed with a special Panda pilot job using the software role. Expect performance to improve.
    • Expect a couple of weeks of testing.
    • Goal to bring into production by end of the month (June).

Throughput initiative - status (Shawn)

Nagios monitoring subcommittee (Dantong)

  • Available space reporting at all sites.
  • Tomasz was organizing a meeting to test globus-job-run (?)

SRM v2 and Space Tokens (Kaushik)

  • Follow-up issue of atlas versus usatlas role.
  • The issue for dCache space token controlled spaces supporting multiple roles is still open.
  • For the moment, the production role in use for US production remains usatlas, but this may change to get around this problem.


  • none

-- RobertGardner - 24 Jun 2008

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback