Monitoring work done in previous phases:

Phase 5 (April 1 - June 30, 2008)

  • Validate that all sites are properly reporting RSV data to SAM for basic availability of compute elements: May 1
  • Re-validate all sites are reporting accounting data correctly to Gratia and to the WLCG portal: May 1
  • Review Nagios monitoring alarms for the Facility and adjust policy as appropriate: May 1
  • Deploy RSV probes for storage elements: June 1
  • Validate reporting of RSV probes for storage elements into SAM: June 15

RSV v2 upgrade for SE probes (June 5, 2008)

This upgrade is necessary to provide monitoring and reporting of storage elements. The instructions are provided here:

For for problems send email to Sarah Williams saewill@iupui.edu.

Site Availability Monitoring (SAM)

All US ATLAS Distributed Facility Sites need to appear in the SAM plots at CERN. The site that wLCG is using to track availability is the GridView site: http://gridview.cern.ch/GRIDVIEW/same_index.php. This site is moderately difficult to navigate and in the instructions below a different site is suggested for checking that your data is reaching the SAM system at CERN.

There are three steps to making your site report to SAM:

  1. Setup the RSV site availability monitoring probes.
  2. Setup a grid proxy for the probes
  3. Check that you site is entering data in the database (note that the tests run only once every two hours).

To check that a site is reporting correctly:

  1. Browse to https://lcg-sam.cern.ch:8443/sam/sam.py
  2. Select the osgce radio button click ShowSensorTests
  3. Select OpenSciencegrid from the regions drop down menu
  4. Select Ops from the VOs drop down menu
  5. Click the ShowSensorTests button
  6. On the next page select all tests
  7. Click the ShowSensorTests button on that page.

  • I believe that if you click the "show ops critical tests" box you will only see test results for the tests that make the determination as to whether a site is up and available.
  • NB: Accessing this page requires that your browser have a valid Grid or CERN certificate loaded.

The above procedure should produce a plot similar to SAM.pdf attached below (note that the plot shows both US ATLAS and US CMS sites).

The documentation to configure RSV is found at: http://rsv.grid.iu.edu/documentation/vdt-package.html. For help with debugging problems configuring RSV please contact goc at opensciencegrid dot org. You can also contact me: luehring at indiana dot edu.

The list of sites known to SAM can be seen at: http://oim.grid.iu.edu/publisher/get_osg_interop_monitoring_list.php in comma delimited format. If your site is not on this list, please send an email to the GOC (goc at opensciencegrid dot org).

Running RSV does require having a proxy. To avoid having to manually renew the proxy from time to time, a service certificate can be used if the proxy does not leave the local machine. A service certificate can be renewed with a cron job while a proxy based on a user certificate requires the user to input the pass phrase to renew it. Do NOT create a proxy with a long expiration time using a user certificate.

To allow the use of a gridftp server that is not on the same node as the OSG Gatekeeper:

  • Download the modified gridftp-simple-probe-modified
  • cp $VDT_LOCATION/osg-rsv/bin/probes/gridftp-simple-probe $VDT_LOCATION/osg-rsv/bin/probes/gridftp-simple-probe.backup
  • mv gridftp-simple-probe-modified $VDT_LOCATION/osg-rsv/bin/probes/gridftp-simple-probe
  • Edit $VDT_LOCATION/osg-rsv/submissions/probes/YOURSERVER__gridftp-simple-probe@org.osg.globus.gridftp-simple.sub
  • append '--external-gridftpdoor' to the end of Arguments = ...
  • source $VDT_LOCATION/setup.sh
  • vdt-control --off osg-rsv
  • vdt-control --on osg-rsv

