r7 - 03 Oct 2007 - 14:38:31 - RobertGardnerYou are here: TWiki >  Admins Web > MinutesOct3



Minutes of the Facilities Integration Program meeting, October 3, 2007
  • Previous meetings and background : IntegrationProgram
  • Coordinates: Wednesdays, 1:00pm Eastern
    • Phone: (605) 475-6000, Access code: 735188; Dial 6 to mute/un-mute.


  • Meeting attendees: Michael, Gabriele, Rob, Rich, Jay, Saul, Wei, Charles, Joe, Bob, Shawn, Xin, Marco, Fred
  • Apologies: Horst (see notes below), others attending BNL DDM workshop

Integration program update (Rob, Michael)


  • See AccountingP2 and items therein.
  • Going through list of actions in the page above. NET2 and SWT2 sites need immediate attention.
  • Deadline for these reports is Friday
  • Q: what to do when a cluster composition changes? Will send information.

BNL LRC interface (Hiro)

  • The web interface to BNL-LRC has gone down ... investigating.

DQ2 update (Hiro)

  • Upgraded BNLDISK which is in production. Documentation has to be updated. Not sure about the stability. Load has been reduced dramatically.
  • Will suggest fair-share for AOD vs Production datasets.

FTS 2.0 (Hiro)

  • Upgraded and is stable, but still testing
  • Backend oracle is also very stable
  • Looking into the monitoring - publishing queries to database via a webpage

Load testing update, issues (Jay)

  • MB/s for gridftp_m2m, gridftp_m2d, gridftp_d2d (3 Gigabytes transferred, 1 stream, aborting after ~45 seconds): loadtest_10_3_2007.jpg
  • The above is a live plot in monalisa
  • Will coordinate with Shawn to make sure the sites are tuned
  • Why so different from Ganglia? Note these tests are WAN tests.
  • dcgftp host may need to be tuned.
  • Increase test time to 2 mins
  • The framework continues to run with tests scheduled every 3 hours

Network Performance and Throughput initiative (Shawn)

Analysis Queues (Bob, Mark)

  • See AnalysisQueueP2
  • Action item: Mark will provide similar instructions for PBS. -Mark still working on it.
  • Main problem is the set of AODs are not available.
  • Action items moving forward (each site:) * We need to setup analysis queues * Allocate a small number of cpu's to this site

Nagios monitoring (Tomasz)

  • Nagios notification hierarchy of services - has been implemented to reduce flooding of messages (eg., from child services if its parent has failed).
  • tier2-osg.uchicago.edu - disabled - re-enabled.
  • gk01.swt2 - unstable. Need status update from Patrick
  • Working on building connections to trouble tickets.
  • Working on links between services
  • Working with Rob Q to write Nagios wrappers for RSV probes (update in two weeks)

Site news and issues (All Sites)

  • T1: Procured seven servers for Panda and databases. Will work with Panda group to migrate to these. Timeframe is either next week, or in the beginning of November.
  • AGLT2: Equipment has been shipped from Dell and expect delvery of first shipment tomorrow. Expect to ramp number of job slots and storage.
  • NET2: New OSG 0.6.0 installed. Cleanse issue, but expect to resume production later today.
  • MWT2: New site - IU_OSG: installing 12.0.6 and 12.0.7. will run kitval. UC_ATLAS_MWT2 - back into production but low on memory. Marco notes issue with management of submit hosts.
  • SWT2_UTA: no report
  • SWT2_OU: We just received confirmation from Dell that they'll be out here again next Monday, so hopefully by the end of the week we'll finally be upgraded with the new hardware and at RHEL4/OSG-0.6.0. And then we can also set up all the NDT stuff and whatever else needs to be done. Other than that, everything is running fine.
  • WT2: Will have to update gridftp OS to rehl4 32bit.

RT Queues and pending issues (Dantong, Tomasz)

Carryover action items

Panda release installation jobs (Fred, Tadashi, Xin)

  • Quite far away. Revert to Xin's method for the time being.


  • Encryption to syslog-ng Still to do, carryover.

Site performance jobs and metrics (Rob)

  • Carryover

RSV, Nagios, SAM (WLCG) site availability monitoring program (Dantong, Tomasz)

New Action Items

  • See items in carry-overs and new in bold above.


-- RobertGardner - 02 Oct 2007

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


jpg loadtest_10_3_2007.jpg (99.8K) | JayPackard, 03 Oct 2007 - 12:50 | MB/s for gridftp_m2m, gridftp_m2d, gridftp_d2d (3 Gigabytes transferred, 1 stream, aborting after ~45 seconds)
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback