r1 - 22 Dec 2009 - 23:58:33 - RobertGardnerYou are here: TWiki >  Admins Web > MinutesDec30

MinutesDec30

Introduction

Minutes of the Facilities Integration Program meeting, Dec 30, 2009
  • Previous meetings and background : IntegrationProgram
  • Coordinates: Wednesdays, 1:00pm Eastern
    • (605) 715-4900, Access code: 735188; Dial *6 to mute/un-mute.

Attending

  • Meeting attendees:
  • Apologies: Rob

Integration program update (Rob, Michael)

  • SiteCertificationP11 - FY10Q1
  • Special meetings
    • Tuesday (9am CDT): Frontier/Squid
    • Tuesday (9:30am CDT): Facility working group on analysis queue performance: FacilityWGAP suspended for now
    • Tuesday (12 noon CDT) : Data management
    • Tuesday (2pm CDT): Throughput meetings
  • Upcoming related meetings:
  • US ATLAS persistent chat room http://integrationcloud.campfirenow.com/ (requires account, email Rob), guest (open): http://integrationcloud.campfirenow.com/1391f
  • Program notes:
    • last week(s)
      • Opportunistic storage for Dzero - from OSG production call. They want it at more OSG-US ATLAS sites than is used today. Asking for 0.5 to 1 TB. We're not sure what it means to configure. Request also coming from Brian Bockelman as well - but few details. There of course are authorization and authentication issues that would need to be configured. Mark: UTA has given 10-15 slots on the old cluster with little impact. We need to have someone from US ATLAS leading the effort to support D0, working through the configuration issues, etc. Mark will follow-up with Joel.
      • LHC shutdown a few hours ago - no more operations till feb.
      • Operations call this morning - reprocessing operations December 22 - but will not be using the Tier 2's.
      • Interventions should be completed within the January timeframe.
    • this week

Tier 3 Integration Program (Doug Benjamin & Rik Yoshida)

  • last week(s):
    • Storage element subscription to Tier 3 completed. Hiro: its working.
    • SE prob at ANL - runaway processes - will monitor
    • Panda submission to Tier 3's - Torre and Doug were going to work on this.
    • T3-OSG meeting - security issues discussed. Have some preliminary ideas.
    • Hiro: T3 cleanup - there is a program being developed to ship a dump of T3-LFC to each T3 - Charles will work on ccc.py to use this dump (sql database)
    • Justin: subscriptions working fine at SMU
  • this week:

Operations overview: Production (Kaushik)

Data Management & Storage Validation (Kaushik)

Shifters report (Mark)

  • this meeting:
     

Analysis queues (Nurcan)

  • Reference:
  • last meeting:
    • User activity has already slowed down this week. Jump with real data arriving didn't materialize. Next three weeks should be clear for upgrades.
    • No major problems with data access (yet). Sometimes release isn't installed at the site. Why was the job scheduled there? Should we put release matching in.
    • Probs accessing conditions database. Rod has been responding to some of the problems. Recent releases seem to solve the problems.
    • User support during the break - mostly one shifter on duty. Next week all north american timezone shifts. Following week it will be a EU-zone person mainly with a NA-zone person only for Thursday-Friday.
  • this meeting:

DDM Operations (Hiro)

Conditions data access from Tier 2, Tier 3 (Fred, John DeStefano)

Throughput Initiative (Shawn)

  • NetworkMonitoring
  • https://www.usatlas.bnl.gov/dq2/throughput
  • last week(s):
    • Each T2 must test against all other T2s
    • Check spreadsheet for correctness
    • Asymmetries between certain pairs - need more data
    • Will start a transaction-type test (large number of small files; check summing needed)
  • this week:
     
    • Jan 12 next meeting - will start bi-weekly.

Site news and issues (all sites)

  • T1:
    • last week(s): One of the production panda sites is being used for high lumi pileup, high memory jobs (3 GB/core). Stability issues with Thor/Thumpers - some problems with high packet rate with link aggregated NICs. 2PB disk purchase on-going. Another Force10 switch with 60 GB inter-switch links.
    • this week:

  • AGLT2:
    • last week: Running well - issue with dccp copies seemed to hang - had to reboot dcache headnode. Would like to do some upgrades of storage nodes Friday. Trying out Rocks5 build for updating nodes.
    • this week:

  • NET2:
    • last week(s): working with local users can access pool conditions data at HU. Separate install queue for software kits.
    • this week:

  • MWT2:
    • last week(s): Update of myricom driver updates to troubleshoot.
    • this week: Upgraded MWT2_UC to dCache 1.9.5-11. Order for ~ 1PB useable storage placed with Dell. IBM would like to discuss possibility of a pricing matrix with US ATLAS on January 7.

  • SWT2 (UTA):
    • last week: Major upgrade at UTA_SWT2 - replaced storage system, new compute nodes - all in place. Reinstalling OSG. SRM, xroot all up. Hopefully up in a day or two.
    • this week:

  • SWT2 (OU):
    • last week: Equipment for storage is being delivered. Will be taking a big downtime to do upgrades, OSG 1.2, SL 5, etc.
    • this week:

  • WT2:
    • last week(s): All is well. Some SL5 migration still going. Suddenly a number of older machines from babar have become available. working xrootd-solaris bug.
    • this week:

Carryover issues (any updates?)

OSG 1.2 deployment (Rob, Xin)

  • last week:
    • BNL updated
  • this week:
    • Any new updates?

OIM issue (Xin)

  • last week:
    • Registration information change for bm-xroot in OIM - Wei will follow-up
    • SRM V2 tag - Brian says nothing to do but watch for the change at the end of the month.
  • this week:

Release installation, validation (Xin)

The issue of validating presence, completeness of releases on sites.
  • last meeting
  • this meeting:

HTTP interface to LFC (Charles)

VDT Bestman, Bestman-Xrootd

  • See BestMan page for more instructions & references
  • last week(s)
    • Have discussed adding Adler32 checksum to xrootd. Alex developing something to calculate this on the fly. Expects to release this very soon. Want to supply this to the gridftp server.
    • Need to communicate w/ CERN regarding how this will work with FTS.
  • this week

Local Site Mover

  • Specification: LocalSiteMover
  • code
  • this week if updates:
    • BNL has a lsm-get implemented and they're just finishing implementing test cases [Pedro]

Gratia transfer probes @ Tier 2 sites

Hot topic: SL5 migration

  • last weeks:
    • ACD ops action items, http://indico.cern.ch/getFile.py/access?resId=0&materialId=2&confId=66075
    • Kaushik: we have the green light to do this from ATLAS; however there are some validation jobs still going on and there are some problems to solve. If anyone wants to migrate, go ahead, but not pushing right now. Want to have plenty of time before data comes (means next month or two at the latest). Wait until reprocessing is done - anywhere between 2-7 weeks from now, for both SL5 and OSG 1.2.
    • Consensus: start mid-September for both SL5 and OSG 1.2
    • Shawn: considering rolling part of AGT2 infrastructure to SL 5 - should they not do this? Probably okay - Michael. Would get some good information. Sites: use this time to sort out migration issues.
    • Milestone: my mid-October all sites should be migrated.
    • What to do about validation? Xin notes that compat libs are needed
    • Consult UpgradeSL5
  • this week

WLCG Capacity Reporting (Karthik)

  • last discussion(s):
    • Note - if you have more than one CE, the availability will take the "OR".
    • Make sure installed capacity is no greater than the pledge.
    • Storage capacity is given the GIP by one of two information providers (one for dCache, one for Posix-like filesystem) - requires OSG 1.0.4 or later. Note - not important for WLCG, its not passed on. Karthik notes we have two ATLAS sites that are reporting zero. This is a bit tricky.
    • Have not seen yet a draft report.
    • Double check the accounting name doesn't get erased. There was a big in OIM - should be fixed, but checked.
    • Reporting come two sources: OIM and the GIP from the sites
    • Here is a snapshot of the most recent report for ATLAS sites:
      --------------------------------------------------------------------------------------------------------
      This is a report of Installed computing and storage capacity at sites.
      For more details about installed capacity and its calculation refer to the installed capacity document at
      https://twiki.grid.iu.edu/twiki/pub/Operations/BdiiInstalledCapacityValidation/WLCG_GlueSchemaUsage-1.8.pdf
      --------------------------------------------------------------------------------------------------------
      * Report date: Tue Sep 29 14:40:07
      * ICC: Calculated installed computing capacity in KSI2K
      * OSC: Calculated online storage capacity in GB
      * UL: Upper Limit; LL: Lower Limit. Note: These values are authoritative and are derived from OIMv2 through MyOSG. That does not
      necessarily mean they are correct values. The T2 co-ordinators are responsible for updating those values in OIM and ensuring they
      are correct.
      * %Diff: % Difference between the calculated values and the UL/LL
             -ve %Diff value: Calculated value < Lower limit
             +ve %Diff value: Calculated value > Upper limit
      ~ Indicates possible issues with numbers for a particular site
      -----------------------------------------------------------------------------------------------------------------------------
      #  | SITE                 | ICC        | LL          | UL          | %Diff      | OSC         | LL      | UL      | %Diff   |
      -----------------------------------------------------------------------------------------------------------------------------
                                                            ATLAS sites
      1  | AGLT2                |      5,150 |       4,677 |       4,677 |          9 |    645,022 | 542,000 | 542,000 |      15 |
      2  | ~ AGLT2_CE_2         |        165 |         136 |         136 |         17 |     10,999 |       0 |       0 |     100 |
      3  | ~ BNL_ATLAS_1        |      6,926 |           0 |           0 |        100 |  4,771,823 |       0 |       0 |     100 |
      4  | ~ BNL_ATLAS_2        |      6,926 |           0 |         500 |         92 |  4,771,823 |       0 |       0 |     100 |
      5  | ~ BU_ATLAS_Tier2     |      1,615 |       1,910 |       1,910 |        -18 |        511 | 400,000 | 400,000 | -78,177 |
      6  | ~ MWT2_IU            |        928 |       3,276 |       3,276 |       -252 |          0 | 179,000 | 179,000 |    -100 |
      7  | ~ MWT2_UC            |          0 |       3,276 |       3,276 |       -100 |          0 | 179,000 | 179,000 |    -100 |
      8  | ~ OU_OCHEP_SWT2      |        611 |         464 |         464 |         24 |     11,128 |  16,000 | 120,000 |     -43 |
      9  | ~ SWT2_CPB           |      1,389 |       1,383 |       1,383 |          0 |      5,953 | 235,000 | 235,000 |  -3,847 |
      10 | ~ UTA_SWT2           |        493 |         493 |         493 |          0 |     13,752 |  15,000 |  15,000 |      -9 |
      11 | ~ WT2                |      1,377 |         820 |       1,202 |         12 |          0 |       0 |       0 |       0 |
      -----------------------------------------------------------------------------------------------------------------------------
      
    • Karthik will clarify some issues with Brian
    • Will work site-by-site to get the numbers reporting correctly
    • What about storage information in config ini file?
  • this meeting

AOB

  • last week
  • this week


-- RobertGardner - 23 Dec 2009

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments

 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback