r14 - 29 Apr 2009 - 16:20:01 - KaushikDeYou are here: TWiki >  Admins Web > MinutesBNLdcacheApr29

MinutesBNLdcacheApr29

Introduction

Minutes of the BNL dcache/HPSS Optimization meeting, Apr 29, 2009
  • Coordinates: Building 510 (Physics), Rm 2-160 at BNL, 9:30 am EDT
    • (309) 946-5300, Access code: 280250; Dial *6 to mute/un-mute.

Attending

  • Meeting attendees: Pedro, Ofer, Rob, Armen, Pedro, Charles, Michael, Wensheng, Xin, Iris, Jane, David, John, Torre, Tadashi, Dantong, Shigeki
  • Apologies:
  • Guests: None

dCache overview & Plans - Pedro

  • current dCache issues
  • plan for the next 3 months
  • Discussion on PostGres? :
    • Database now ~160 GB
    • Move to 64 bit (4GB -> 48GB cache) ~May 11th
    • Deploy SSD's ~July
    • Possible solution - move to Oracle with Chimera ~summer
    • How do we test:
      • Chaotic user analysis with large IO (number of files) from disk
        • Test harness 1: Jason's test
        • Test harness 2: ~1000 pilots, each reading ~100 files, no processing, read from disk (Xin, Wensheng)
      • Production reading from tape (large number of dccp -p)
        • Test harness 1: Jason's test
        • Test harness 2: real merge jobs

Scalability of pnfs server - Hiro, Pedro, Shigeki, Michael

  • Maximum connections, load...
    • HPSS:
      • David - sometimes see duplicate requests from dcache, upto 6 (but not too harmful)
      • Current queue depth is 30k - it would be good to limit clients to this number
    • PNFS server:
      • How many dccp -p commands per minute can be supported? (Pedro)
  • pnfs load plots

Storing pnfsid in LFC - Hiro

  • Reasoning, overview and current status
  • Maintaining data integrity by using pnfsid stored in LFC
  • Plans/procedure for keeping cache updated in LFC

Pandamover status & plans

  • Tuning it - wait for pnfs metrics from Pedro
  • Switch to DQ2? Try for 7 days, starting May 6/7th.
  • Improved monitoring
    • New table in PandaDB?
    • Monitoring based on table (Alexei's team)
  • Handling error conditions (right now if pnfsid is missing, retry using filename) - continue for now

Plans for Panda pilot

  • Sites should run local movers - best way to optimize local site performance

HPSS/dcache monitoring - Pedro, David, Shigeki

  • Available tools
  • First responders
    • storage management group responsibilities
      • [staging] insure that there is enough stage requests on HPSS (30k) before queueing on dCache
      • [staging] warn and follow-up on failures of copy operations from HPSS disk cache to read pools
      • [migration] warn and follow-up on failures of copy operations from write pools into HPSS disk cache
    • hpss team responsibilities
  • Second responders - shift team procedures
  • Additional alarms, emails...

Test and development plan for next 6 months

Follow up meeting ~May 28th


-- PedroSalgado - 29 Apr 2009
  • added slides
-- PedroSalgado - 28 Apr 2009
  • added current storage management group responsibilities regarding staging
  • added links for dCache monitoring
-- KaushikDe - 27 Apr 2009

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments


pdf 20090429_dCache_Issues.pdf (164.5K) | PedroSalgado, 29 Apr 2009 - 09:40 | Overview of the current dCache issues.
pdf 20090429_Storage_Group_Plans.pdf (183.6K) | Main.psalgado, 29 Apr 2009 - 08:48 | Storage management group plans for the next 3 months.
jpg 20090429_Staging_priority.jpg (43.5K) | PedroSalgado, 29 Apr 2009 - 09:35 | Stage priority web service
pdf BNL_dCache_performance_meeting_2009_04_28.pdf (63.8K) | HironoriIto, 29 Apr 2009 - 11:17 |
pdf lfc_with_dcache.pdf (1887.6K) | HironoriIto, 29 Apr 2009 - 11:18 |
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback