r6 - 20 Aug 2008 - 08:46:26 - RobertGardnerYou are here: TWiki >  Admins Web > MinutesMay21

MinutesMay21

Introduction

Minutes of the Facilities Integration Program meeting, May 21, 2008
  • Previous meetings and background : IntegrationProgram
  • Coordinates: Wednesdays, 1:00pm Eastern
    • Phone: (605) 475-6000, Access code: 735188; Dial *6 to mute/un-mute.

Attending

  • Meeting attendees: Michael, Rob, Fred, Charles, Shawn, Rich, Jeff, Sarah, John@BU, Saul, Horst, Karthik, Justin, Rob Quick, Torre, Jay, Patrick, Xin, Hiro, Tom, Tomasz
  • Apologies: Bob
  • Guests: Rich Carlson, Jeff Boote, John Vollbrecht, Rob Quick

Integration program update (Rob, Michael)

US LHC Networking (Rich Carlson, Jeff Boote, John Vollbrecht)

  • Planning for next week's meeting - perfsonar, DCN
  • Deployment of E2E? monitoring
    • Jeff is leading the deployment effort of perf-sonar tools
    • Sites keep autonomy, but federate for LHC monitoring
    • What are the site requirements? A dedicated system will be required, as closely located to the Tier2 machines as possible.
    • Sometimes you'll want them to be as close to the border as possible.
    • A web-100 enabled kernel is needed.
    • And there were security issues with the kernel version 2.6-23 currently.
    • Expect this to be persistent
  • Shawn will provide some recommendations for each site, and I2 can help with these as well.
  • We want to have this infrastructure in place very soon. Site admins should be thinking about the issue and come prepared.

RSV --> SAM (Fred, Rob Quick)

All ATLAS CE's are now showing in SAM. To check your site's status click this link. Note: you must have your grid certificate loaded in your browser to see the link. Information on installing the RSV SE probes can be found here.
  • We are reporting availability figures now to SAM - a milestone achieved. (GridView reports SAM information as well.)
  • Publishing of srm information.
  • Not all sites are using service certificates - there is information available on how to set this up.
  • There are some configurations options available, whether or not GUMS is being used, eg.
  • Scheduled downtime reporting - in progress, there will be webpage to report to. This is important for reliability reports to WLCG.
  • These can be used on an OSG 0.8 sites.
  • What about robot certificates? Waiting for OSG security group approval.
  • ping and copy is tested.
  • We need a list of fqdn, names, and types and send this to the GOC. The SEs will need to be "registered" in the OSG OIM database with contact information.
  • Michael: aim to have this in place as soon as reasonable, complete during the course of June.

RSV-SE probes testing (Wei, Xin)

  • Status of SRM-Bestman probe
    • Its working. Need to update Bestman-xrootd to 2.2.0.8c2. Has package on a separate machine. Working DONE
  • Status of SRM-dCache probe * Ran on ITB at BNL, used service certificate - if this is used, RSV would need to be running on the SRM door. Working DONE

Analysis queues, FDR analysis (Nurcan)

  • No updates (back from vacation).
  • Re-processed FDR1 data, there will be DPD making at Tier2. Replication to Tier2's underway.
  • No problems reported so far.
  • Brokering based on data location is working okay, so decide to put files everywhere. Encourage users to submit lots of jobs, and watch Panda-brokering.

Operations: Production (Kaushik)

  • FDR2 production complete late last week. Weekend - some critical samples redone.
  • Some digitization jobs are remaining to done. 28 input file jobs (!)
  • 14.0.1.2 validation samples
  • Borut will increase statistics of validation samples, to keep sites full.
  • Re-processing. Rod has made a conditions release tarball, being distributed. FDR1, M5, M6.
  • 13.0.35 release being installed at Tier2.
  • End of next week: single particle jobs for calibration. T0-->T1-->T2, plus full MC simulation, plus re-processing with 13.0.35.2. Trying to be as realistic as possible.

Shifts (Mark)

  • Relatively quiet due to lack of jobs.
  • AGLT2 - md5sum issues.
  • MWT2 autopilot - waiting for changes from Paul on pilot.

CCRC08

  • Follow-up from last time
    • Will be managed centrally - need to communicate the four endpoints that are ready to Stephane and Alexei.
    • Make sure old CCRC08 data is deleted - mainly for free space.
    • 10 TB is the minimum
  • All T2's have upgraded to new DQ2 1.0
  • All sites getting files.
  • AGLT2 - space token problem.
  • BU - having call-back issues.

Operations: DDM (Hiro)

Review status of SRM v2 and Space Tokens

  • Status as of May 7 meeting:
    • AGLT2 - ATLASDATADISK - tested; 90 TB
    • SLAC - tested; 50 TB
    • BU - tested; 160 TB RAW
    • SW - setup, not tested; 60 TB
    • MW - working still.
  • Status today: CCRC08 is working now on almost all sites.

Pilot upgrade for space tokens

  • Status of site-mover tools in pilot code; Paul is working the issue but is swamped. Priority is to get everyone switched over to autopilot.
  • New wn-client package - installed at AGLT2; test queue needs to be setup.

Unified LHC client (Marco)

  • Dependency resolving and packaging for a unified (OSG/gLite) client
  • Adding components needed to query LFC.
  • Repacked minimum components required for dq2_user tools
  • Has provided a package that works with VDT 1.10.1.

LFC status (John)

  • Update on LFC integration and migration activities and plans
  • What about migration? Need to setup the Oracle-backed service.
  • The question for deployment is do we want to go to a central service model. Should we stop local catalog's at sites?

WLCG accounting

  • Sites will probably need to run configure-osg in the near term to publish SI2K? values through the GIP.
  • SLAC has already done this. Not complicated.
  • OSG will be maintaining a table with values that we'll have to agree to.
  • Karthik would like to automate this, in the long run.

Next procurements

  • Standing agenda item, see CapacitySummary.
  • When should we coordinate next purchases - lets discuss this next week.

OSG 1.0

  • ITB 0.9 testbeds at OU and BNL, UC
  • Panda test job validation progress (Rob)

Throughput initiative - status (Shawn)

Nagios monitoring subcommittee (Dantong)

  • Of 5 sites, 3 are correctly reporting space available figures.
  • Cleaning up false alarms.
  • For low job success rates, notification goes to shifters rather than the sites (policy change).
  • SE probe work in progress.

Release installation via Pacballs (Xin)

  • Follow-up on the pacball-based method next week. Update?
  • Alessandro and Stan working on description of releases. Timeframe? Fred will provide.

Site news and issues (all sites)

  • T1: processing: heavily involved in FDR-2 preparation. Working with Berkeley on mixing and filtering to produce bytestream data, going well over the weekend. Mixing jobs are > 3 days. In run 2 out of 5 runs. CCRC throughput starting this morning. Noted we're capped at 1 Gbps - primary link between CERN and BNL down, running on backup (should be 10 Gbps). Consulting w/ Esnet - probs with reservation system. Now corrected. Running at 400-500 MB/s. T0T1? , T1T0? storage classes. 175 MB/s output. By end of June will have 5 Gridftp doors each at 10 Gbps.
  • AGLT2: Transferring to standard SE (not space token). Still troubleshooting proxy certificate mapping to usatlas3 rather than usatlas1.
  • NET2: Added 128 Harpertown cores. Waiting on a network card from IBM on gatekeeper.
  • MWT2: finally got ATLASDATADISK setup: 14 TB
  • SWT2 (UTA): shutting down SWT2_UTA for electrical work tonight. All is well.
  • SWT2 (OU): Replacement server for tier2-02, installing today. 10G switch - going with Cisco rather than HP Procurve.
  • WT2: no report.

RT Queues and pending issues (Tomasz)

Carryover action items

  • None

AOB

  • None.


-- RobertGardner - 20 May 2008

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments

 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback