r3 - 18 Dec 2007 - 13:26:28 - RobertGardnerYou are here: TWiki >  Admins Web > MinutesDec12

MinutesDec12

Introduction

Minutes of the Facilities Integration Program meeting, December 12, 2007
  • Previous meetings and background : IntegrationProgram
  • Coordinates: Wednesdays, 1:00pm Eastern
    • Phone: (605) 475-6000, Access code: 735188; Dial 6 to mute/un-mute.

Attending

  • Meeting attendees: Saul, Gabriele, Charles, Rob, Fred, Wensheng, Patrick, Dantong, Wei, John, Xin, Horst, Kathik, Nurcan, Mark, Kaushik, Shawn, Hiro
  • Apologies: none

Integration program update (Rob, Michael)

  • Phase 3 plan: here
  • Phase 3 SiteCertificationP3
  • Review of action items from Tier2 meeting at SLAC: NotesTier2Nov30. Overarching near term goals (December 15) are:
    • Establish 200 MB/s sustained throughput to all Tier2s
    • Establish analysis queues at all Tier2s
    • Replicate Rel 12 AODs to all Tier2, for routine pathena analysis

Operations: Production (Kaushik)

  • Production summary (Kaushik)
    • All going fine. BU solved issue w/ storage endpoint. AGLT2 - during site reports.
    • Follow-up on known leaks in pileup jobs - stopped submitting those jobs.
  • Production shift report (Nurcan/Mark)
    • Wensheng - OU-OSCER: problem w/ copying output to storage - Paul investigating. UTD NFS problems.
    • There is a task that is generating very large files > 2 GB. These jobs are failing at BNL, but not at UC_ATLAS_MWT2. Need to take this issue w/ Paul.
    • eLog December 15 - status (Mark) - will start using it today to try out. Next week will make this available official.
  • Follow-up on ADC Operations plan to submit to Alexei. Kaushik will send to ATLAS management today. Note January 21-22 at CERN there will be a combined shift training meeting.

Operations: DDM (Alexei)

  • DQ2 0.5.0 schedule and plan (Hiro)
    • Will install at BNLDISK and BNLTAPE (not BNLPANDA which will impact production). Today/tomorrow.
    • Patrick will take a look today.
    • Charles will also take a look.
    • Kaushik reports that France and UK have seen improvements using OSG 0.5. Backlogs clearing much more quickly.

  • Follow-up on LRC upgrade project
    • John has LFC running, and is doing internal testing. Will load up and check performance. Learning client tools - eg. how to add additional tools.
    • John will convene an LFC working group.
    • Priority is to bring up an instance of a LFC
    • Milestone of December 20 - public LFC at BNL, ready for clients to register

  • Follow up on AOD replication for analysis at Tier2s - will resume at all sites. - Status
    • We looked at AMANDA status - generated more questions than answers.

Analysis Queues (Bob, Mark)

  • See AnalysisQueues
  • Email Bob, ball@umich.edu.
  • Follow-up Four sites are various states of implementation:
    • SLAC - there may be an issue with the pilot copying files out - Paul looking into this.
    • MWT2 - Charles has setup the Condor config. Defining siteinfo config. Mark will send test pilots.
    • OU - May still be an issue: will submit some test jobs
    • BU - ready for test jobs.
    • SWT2_UTA - done * Can we agree that we have this mileston completed by December 15? Yes. * Follow-up Can we run jobs on a regular basis, and collect information? Mark will automate submission of pathena test jobs.

Accounting (Shawn, Rob)

Follow-up on (see Accounting) issues.
  • See: http://www3.egee.cesga.es/gridsite/accounting/CESGA/tier2_view.html
  • Follow-up from last meeting:
    • SWT2_UTA still being addressed. Need VORS registration.
  • BNL accounting info was lost - Xin investigating. There was confusion on the WLCG APEL site - having to do with the change in the Gratia site name - they appear to have static mappings. Xin still investigating.

Throughput initiative - overview (Shawn)

  • See notes from dedicated meeting this week: MinutesTPDec10
  • Need to document storage endpoints, and to benchmark the storage w/ either bonnie++ and iozone.
  • Dantong: looking into increasing memory on gridftp doors, but can't - these are old systems.
  • Wei notes that SLAC sees CPU limiting.
  • Will meet again on Monday.

OSG

Panda release installation jobs (Xin)

  • Follow-up with Xin on the status of the dedicated submit host. Xin has been in contact w/ Tadashi and has made some recommendations for more features.
  • Will setup a machine to setup installation pilots using the software role.
  • Milestone - December 19
  • Saul also notes that Stan is distributing pacman balls.

RSV, Nagios, SAM (WLCG) site availability monitoring program (Tomasz)

  • Split of Nagios server into internal and external - still working on this. End of the year.
  • RSV proposal to Arvind. Tomasz, Dantong, John.

Site news and issues (all sites)

  • T1: all is well; four servers for Panda infrastructure setup, conduits exist; only thing left is the SVN. Tadashi will handle the migration.
  • AGLT2: major transformation to Rock 4.3 w/ SL. Everything is up.. jobs going idle. Bob will address.
  • NET2: all is well; gpfs storage online, lrc updated. moved proved production and ddm ops to GPFS. 500/800 MB/s W/R
  • MWT2: all well; just did DQ 0.5
  • SWT2_UTA: still working on install of new cluster - hope to get online today;
  • SWT2_OU: all okay; still ocassional crash of gridftp working w/ Dell. Paul looking at a pilot issue.
  • WT2: setup bestman, used w/ FTS, moved data around well, requesting more grid servers

RT Queues and pending issues (Tomasz)

Carryover action items

Syslog-ng

  • Encryption to syslog-ng Still to do, carryover.
  • Initial work starting in the OSG ITB.

Site performance jobs and metrics

  • Carryover; some benchmarking work w/ quad core opterons.
  • No news.

New Action Items

  • See items in carry-overs and new in bold above.

AOB

  • Proposed Computing Operations/Integration Program holiday schedule:
    • December 19 - regular meeting
    • December 26 - no meeting
    • January 2 - no meeting
    • January 9, 2008 - resume w/ Phase IV
  • none

-- RobertGardner - 11 Dec 2007

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments

 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback