r3 - 21 Dec 2011 - 14:05:51 - RobertGardnerYou are here: TWiki >  Admins Web > MinutesDec21

MinutesDec21

Introduction

Minutes of the Facilities Integration Program meeting, December 21, 2011
  • Previous meetings and background : IntegrationProgram
  • Coordinates: Wednesdays, 1:00pm Eastern
  • Our Skype-capable conference line: (6 to mute) ** announce yourself in a quiet moment after you connect *
    • USA Toll-Free: (877)336-1839
    • USA Caller Paid/International Toll : (636)651-0008
    • ACCESS CODE: 3444755

Attending

  • Meeting attendees: Michael, Patrick, Mark, Sarah, Nate, Rob, Torre, Wei, Shawn, Bob, Horst, Hiro
  • Apologies: Saul, Jason, Alden, Dave

Integration program update (Rob, Michael)

  • Special meetings
    • Tuesday (12 noon CDT, weekly - convened by Kaushik) : Data management
    • Tuesday (2pm CDT, bi-weekly - convened by Shawn): Throughput meetings
    • Friday (1pm CDT, bi-weekly - convened by Rob): Federated Xrootd
  • Upcoming related meetings:
  • For reference:
  • Program notes:
    • last week(s)
      • Winter shutdown has begun - no pp beams again until April
      • Review status if CVMFS deployment at sites:
        • BNL DONE
        • AGLT2 DONE
        • MWT2 DONE
        • MWT2_IllinoisHEP DONE
        • NET2_BU: problem with nodes with small local disk; follow HU.
        • NET2_HU: in progress, deployed. Need to get jobs using it. Will depend a bit on his availability.
        • SWT2_UTA: in progress for prod-only cluster; could finish by end of year. For prod-analysis cluster - will defer to next year, not enough time.
        • SWT2_OU: everything is ready, but defer to next year until new head nodes arrive for squid. Wants to wait until beginning of next year to get the OSCER cluster at the same time. Worried about changes during the break.
        • WT2: it is deployed; working with a new CE as well. Validation jobs are failing, as are install jobs. dq2 client, gcc, and cctools are not being installed. Log files not visible.
      • GEANT4 production: communication with Gabriele this week - still gathering some of the details. When available with test at MWT2 and update SupportingGEANT4
      • OSG All hands meeting: March 19-23, 2012 (University of Nebraska at Lincoln). Program being discussed. As last year part of the meeting will have co-located US ATLAS, and joint USCMS/OSG session.
    • this week
      • There is a new group working on IO performance, see: https://indico.cern.ch/conferenceDisplay.py?confId=166930. There will at some point be results and studies from this group that should benefit performance in the facility.
      • Capacity updates for end of 2011 will be needed.
      • Michael - we do need to work on the CVMFS deployment - so

Progress on procurements

  • Interlagos machine - 128 cores - Shuwei's diverse set of tests show poor performance. Not usable for us. There is an effort to look at RHEL6 evaluation - which is highly recommended by AMD and Dell. Not likely to get result in time.
  • Regarding memory requirements, discussion with Borut: baseline is still 2 GB/logical core, but expect there will be high mem queues needed at some point; try
  • AGLT2: equipment at UM PO's have been put in (8 blades to a Dell chasis). S4810 F10 switch. Buying port at OmniPoP (shared switch, in coordination with MWT2 sites). MSU - meeting to discuss details.
  • MWT2: working on R410-based compute node purchase at IU and UC. Extending CC at Illinois. OmniPoP switch ports (2 UIUC, 1 UC, 1 IU plus the shared port costs).
  • SWT2: getting orders in for the remaining funds; UPS infrastructure, and a smaller compute node purchase. Purchase of two 10G gridftp doors, but in next phase (Feb, March). OU: three new head nodes, and new storage already purchased, and everything is at 10G.
  • WT2: deployed 68 R410; will spend more on storage next year and other smaller improvements. 2.1 PB currently. Will investigate SSDs for highly performant storage. 100 Gbps - will discuss with his networking group.

Follow-up on CVMFS deployments & plans

  • OU - January 15
  • UTA - will focus on production cluster first. Will do a rolling upgrade. Expect completion by January 15 as well.
  • BNL - Michael notes that at BNL they have seen multiple mount points from the automouter. They seem to go away eventually. Under investigation by CVMFS experts. A ticket has been filed. In process of adding more compute nodes up to the full capacity.
  • WT2 - fully converted. DONE

Production and data management issues - holiday operations

  • In terms of shift coverage, most are covered.
  • Weekend problems - affecting autopilot submission. Triggered discussion to move autopyfactory by mid-January. Have discussed local installations of autopyfactor - can checkout code.
  • Brokerage issue of last week seem to have resolved.
  • Perhaps we should clean up sites in the Panda monitor.
  • Send email to Alden

AOB

last week this week
  • Jobs using 64 bit Python versus 32 bit python failing at SLAC and UTA - but its a general issue. John Hover is investigating a general solution, working with Marco Mambelli. The whole issue needs a comprehensive review.
  • Next meeting January 4, 2012
  • Happy holidays!


-- RobertGardner - 20 Dec 2011

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments

 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback