r5 - 20 Aug 2008 - 08:46:26 - RobertGardnerYou are here: TWiki >  Admins Web > MinutesMay7

MinutesMay7

Introduction

Minutes of the Facilities Integration Program meeting, May 7, 2008
  • Previous meetings and background : IntegrationProgram
  • Coordinates: Wednesdays, 1:00pm Eastern
    • Phone: (605) 475-6000, Access code: 735188; Dial *6 to mute/un-mute.

Attending

  • Meeting attendees: John, Saul, Shawn, Hiro, Charles, Rob, Horst, Karthik, Kaushik, Mark, Patrick, Wei, Sarah, Marco
  • Apologies: Michael

Integration program update (Rob, Michael)

Review status of SRM v2 and Space Tokens

  • See also MinutesMay2SpaceTokens
  • srmcp - have we given up entirely?
  • Next version of dcache may not require srmcp.
  • Upcoming release of dCache may not require srm - but the one that uses dccp.
  • Pilot still needs work. And other Panda developments.
  • All we can do is keep the old
  • AGLT2 - ATLASDATADISK - tested; 90 TB
  • SLAC - tested; 50 TB
  • BU - tested; 160 TB RAW
  • SW - setup, not tested; 60 TB
  • MW - working still.

RSV-SE probes

  • Xin - has tested RSV-SE probe in the new release, running on the SRM door node; works. Reports correctly to the RSV collector at the GOC. srm-ping, srm-cp
  • Would like to use existing infrastructure on OSG 0.8 gatekeepers.
  • for SRM-dCache - probe is ready.
  • for Bestman-xrootd - a minor issue needs to be resolved. Wei thinks won't be a problem. Sites will need to upgrade their bestman-xrootd. Wei will follow-up with Arvind and Alex.

CCRC08 news

  • Managed centrally - need to communicate the four endpoints that are ready to Stephane and Alexei.
  • Make sure old CCRC08 data is deleted - mainly for free space.
  • 10 TB is the minimum

Next procurements

LRC modifications for user data deletion

  • See SiteCertificationP5
  • Heads-up minimal testing. AGLT2 trying it out now - Hiro sending additional information. Send feedback to Charles for official instructions. Gridsite must be enabled.

Analysis Queue Update

  • Analysis workshop next week at Argonne
  • Auto-site selection now working well - Marco's testing.
  • FDR1 reprocessing status? "This week", but nothing started.

Operations: Production (Mark, Kaushik)

  • After the dCache upgrade at BNL, there were issues with Panda Mover. US cloud was basically off. Issues cleared up on the BNL side.
  • On-going push to get last sites to autopilot. Transitioning BU last - all looks okay.
    • Next on list will be MWT2. Waiting for Paul - one of wrapper scripts need changes for dCache sites. Hopefully by end of week.
    • Then Tier3's
  • There may be pile-up jobs requiring lots of input files - that take only 30 minutes to run, but hours to stage input files. Not much we can do.
  • Bit of a backlog

Operations: DDM (Hiro)

  • http://www.usatlas.bnl.gov/dq2/monitor - development stalled, will monitor space used by users.
  • DQ2 site service upgrade status/plan
    • Follow-up: 0.6.6 - no word from Miguel. Old version is not updating the dashboard very well.
  • Concerned about space at BNL - but new storage is coming online soon.

RSV --> SAM (Fred)

  • Please see this link: http://www.usatlas.bnl.gov/twiki/bin/view/Admins/MonitoringServices
  • Follow-up status:
    • All sites are in the RSV monitor.
    • On SAM side, waiting to resolve uta.edu versus swt2.org. Not an issue at the UTA site; assuming being handled at GOC.
  • For AGLT2-muon-calibration FTS channel - still need to report through BNL? Answer - yes.

Throughput initiative - status (Shawn)

Nagios monitoring subcommittee (Dantong)

  • No news.

Panda release installation issues (Xin)

  • Follow-up on the pacball-based method next week. No update.

OSG 1.0

  • ITB 0.9 deployment and validation in progress.
  • Sites at OU, UC, BNL all ready for VO testing.
  • Rob testing Panda submission.

Site news and issues (all sites)

  • T1: Hiro reports that storage is tight. Adding a 16 TB thumper today. Main implication is for FDR data and CCRC data transfers. Will be resolved by tomorrow.
  • AGLT2: all okay; some cpu drop at midnight last night.
  • NET2: all okay; SRM v2 up and running; still waiting from IBM a 10G replacement.
  • MWT2: all okay; running into some scaling issues with addition of new nodes (number of tcp sockets, etc). Upgraded all head nodes recently.
  • SWT2 (UTA): all okay; 1 day shutdown on 21st for electrical work on distribution panels.
  • SWT2 (OU): working on 10G cards and routing tests for the external network. FTS problems reported by Yuri? (Hiro thinks a minor problem.)
  • WT2: running fine, no best.

RT Queues and pending issues (Tomasz)

Carryover action items

  • None

AOB

  • None.


-- RobertGardner - 06 May 2008

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments

 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback