r15 - 13 Jun 2007 - 14:51:04 - RobertGardnerYou are here: TWiki >  Admins Web > MinutesJun13

MinutesJun13

Introduction

Minutes of the Facilities Integration Program meeting, June 13, 2007

Attending

  • Meeting attendees: Rich, Michael, Rob, Xin, Wei, Joe, Kristy, Karthik, Horst, Kaushik, Nurcan, Bob Ball, Shawn, Patrick, Mike, Jay, Tomasz, Fred, others..
  • Apologies: none

Last week's action items

  • Tomasz will consult local experts for off-site Nagios console access: there have been some fixes: need to test.
  • Tomasz will take first steps towards creating a Nagios plugins respository: BNL will setup a SVN respository accessible by grid certificate.
  • All: sites to upgrade syslog-ng installation: defer for now
  • Develop load tests requirements and agenda (for LoadTestsP1) - Michael, Rob: done
  • Develop DQ2 0.3 deployment plan for US ATLAS facilities - Michael, Rob, Alexei: done: see below
  • All sites: continue with OSG 0.6 installations as time permits: in progress, see below
  • All sites: check off actions taken in SiteCertificationP1: not updated
  • Follow-up on AOD replication question about archival bit not being set - Hiro, Michael: forgot to discuss this

This week's focus: Load tests project

  • See initial requirements document at LoadTestsP1
  • Jay is developing a load test configurator and dashboard for monitoring. See conceptual document on LoadTestsP1.
  • The second piece will be the setup of a repository for load test packages, and as a community effort develop simple load testing scripts. Jay will incorporate these into the Configurator+dashboard.
  • Shawn remarks this is consistent with much of the network monitoring and optimization tools that are being packaged as part of the network research projects. Many of these tests require Web100-enabled Linux kernel (see Internet2 CD distributed by Rich) and the presence of an installed NDT server, but there are some that do not. Rich suggests iperf + wrapper could be used as a generic load test.

DQ2 0.3 upgrade

  • Hiro is at CERN and testing BNL services; subsciptions are being made for stress tests.
  • Replacement of DQ2 0.2.12 site services is imminent - but we need to look at and examine the software.
  • Hiro is in contact with UTA (Patrick) and UC (Charles). Need to get feedback to Hiro and DQ2 developers.
  • Charles reports the initial installation is apt-get based, and requires mysql5. Waiting for the next step in instructions.
  • The plan is that:
    • Charles and Patrick will work with Hiro on the current installation method.
    • Provide first feedback to DQ2.
    • Develop a twiki with instructions for other sites. UM is another early candidate.
    • Need to get collective feedback from Tier2 on instructions, checks, etc.
    • Do some integration testing work to check out required functionality, learn about the software, troubleshooting, etc.
    • Based on this, develop schedule for production deployment

OSG 0.6 deployment update

  • See OSGservicesP1 for info, and SiteCertificationP1 for site status. Please add "gotcha" and additional notes here that come up during the installation so that we can compare notes and experiences.
  • Site status:
    • AGLT2: re-installed, but now experienced a problem with status not getting reported back through condor.
    • BU: no report.
    • BNL_1: done ; BNL_2: before next week.
    • MWT2_IU: done
    • MWT2_UC: done
    • UC_ATLAS_MWT2: done
    • UC_Teraport: in progress
    • OU - upgraded OSCER; still waiting on additional nodes, and a move of the cluster (sched for June 22).
    • UTA_SWT2 cluster; have OSG 0.6 installed, but not the default gatekeeper.
    • UTA_dppc cluster: just waiting for a lull in production.
    • SLACXRD: now running OSG 0.6

Logging: Syslog-ng upgrade

  • LoggingServicesP1 - describes VDT-based syslog-ng install.
  • This is changing with additional enhancements in VDT this week. Rob suggests holding off another week on upgrading your syslog-ng. Of course any feedback is appreciated.

AOB

  • AGLT2 - continued problems gram-job manager
  • MWT2_IU - upgrading dcache
  • MWT2_UC - fixing dq2 0.2.12 access to dcache backend (edg_gridftp_mkdir + dcache behavioral bug - understood and fixed by Charles)
  • No other site reports.
  • Fred reports that any late registrants to the Tier3/Tier2 workshop should contact him regarding hotel accommodations in Bloomington.
  • No Wednesday Integration meeting next week - we'll all be at the Tier3/Tier2 workshop in Bloomington.

New Action Items

  • Test of off-site Nagios console access.
  • Setup of Nagios plugins repository.
  • Evaluate initial OSG site availability scripts.
  • Setup of Load Tests scripts repository.
  • Follow-up with Shawn and folks on first set of network I/O load tests.
  • Follow-up with OSG troubleshooting on AGLT2 gatekeeper problem.
  • Develop DQ2 0.3 site services installation notes for the Facility.
  • Test deployment and functionality on a number of sites.
  • Update SiteCertificationP1 with status of each site.

-- RobertGardner - 12 Jun 2007

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments

 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback