At Rob Gardner's request, we will meet this Friday, tomorrow, 11/15 at 1:30pm via ReadyTalk to discuss HS06 measurements and how they apply to our sites. We should have at least one representative from the Tier-1 and each Tier-2 on this call. Please be prepared with:
1. Up to date information on your subcluster configurations
2. Information on how the HS06 measurements were made on your subclusters
3. Information on how these measurements are represented in the USATLAS Normalization Factors spreadsheet, and reflected in the OSG GIP pages.


======= HS06 factors at many sites ===================

====== USATLAS Normalization Factors ===============

======== Subcluster information in the GIP ==============


Attending: Bob, Dave, Mark, Horst, Shawn, Rob, Chris

This meeting was triggered by the slide in the first Reference above, which drew into question the HS06 measurements at sites. In particular, a number of sites are in the wings of the distribution, either unusually high or low event rates given their stated HS06 values.

HS06 run correctly at most sites, but is the weight to arrive at final value correctly computed? If there is a large disparity between different hardware, is the result going to be correct? Opportunistic resources can also (potentially) throw this off if not properly accounted.

Are our used HS06 correct? If other sites' measurements are used, then the same hardware/software/BIOS configuration as at that other site must be ensured.

May need a separate queue for opportunistic resources else result is thrown off. Or are we already there? Answer is mostly yes. These exist at WT2, OSCER and Lucille.

  • Question over whether all stated resources at WT2 are always available, or if it is time-dependent, with disparate hardware not always available.

Other factors:

  • machine not full
  • HT on, so actual perceived HS06 is higher.
  • We are at SL6 now.
    • Did all sites recalculate their HS06 following the upgrade?
    • Changes from 2-15% can result (increases).
  • Are BIOS settings always known and consistent?
  • Many high-I/O Analysis jobs will cause cpus to run inefficiently.

In the US we've spent a lot of time trying to do this right, but we don't know what the rest of the world has done.

So, we need to get all sites to rerun HS06 as needed, and update all their reported numbers.
There are three locations where this update must happen:

  • Spreadsheet (v29-1 current)
  • OIM
  • sub-cluster reporting.


Every site needs to check their results, and justify what they are reporting by Dec 11.

Bob will run on sites/machines where requested, if they cannot run on their own.

Will discuss again at the Dec 11/12 Arizona meeting.

-- RobertBall - 19 Nov 2013

