r6 - 18 May 2011 - 14:54:45 - RobertGardnerYou are here: TWiki >  Admins Web > MinutesMay18

MinutesMay18

Introduction

Minutes of the Facilities Integration Program meeting, May 18, 2011
  • Previous meetings and background : IntegrationProgram
  • Coordinates: Wednesdays, 1:00pm Eastern
  • Our Skype-capable conference line: (6 to mute) ** announce yourself in a quiet moment after you connect *
    • USA Toll-Free: (877)336-1839
    • USA Caller Paid/International Toll : (636)651-0008
    • ACCESS CODE: 3444755

Attending

  • Meeting attendees: Shawn, Rob, Michael, Nate, Jason, Dave, Bob, Torre, AK, Saul, Sarah, John, Charles, John DeStefano, Wensheng, Armen, Kaushik, Mark, Wei, Horst, Tom, Karthik, Fred, Booker, Doug, Xin, Hiro
  • Apologies: Patrick

Integration program update (Rob, Michael)

  • Special meetings
    • Tuesday (12 noon CDT, weekly - convened by Kaushik) : Data management
    • Tuesday (2pm CDT, bi-weekly - convened by Shawn): Throughput meetings
  • Upcoming related meetings:
  • For reference:
  • Program notes:
    • last week(s)
      • Integration program from this quarter, FY11Q3
      • Load on the facility is spiky/light, hear more from Kaushik on this further down
      • LHC machine performing well despite UPS problem, may not see beams for several days so site maintenance would be ideal in the next few days
      • ADC is in progress of preparing a technical meeting soon, likely next Monday after the ADC weekly meeting
    • this week
      • Recap of May 12 LHCONE summit meeting in Washington DC
        • http://www.internet2.edu/science/LHCONE.html
        • Discuss some details during the Networking section below
        • Michael - this activity is motivated by the fact that LHC models are being re-thought, having implications on network traffic. Is the existing infrastructure adequate? Headed by David Foster, head of LHCOPN (Tier 0-Tier 1). There are architecture designs available, and there is a lot of engagement with the user community.
      • NEW US ATLAS Facilities meeting on virtual machine provisioning and configuration management, FacilitiesMeetingConfigVM
        • First day - Includes hands-on type demonstrations, presentations, discussions
        • Digest overnight, continue discussion, look for commonalities, look for what others are doing, move the facility forward with less effort
        • Please contribute to this - mid-June, Doodle poll available
      • There is OSG site administrator's meeting being planned for August
      • Opportunistic usage - there has not been much load from HCC as of yet. Have invited a second VO - Engage (a collection of VOs).
      • LHC status - there was machine development prior to the technical stop. Ramping up now, expect collisions tonight, and more interest in physics analysis. Plan interventions carefully.

OSG Opportunistic Access (Rob)

last week(s) this week
  • SupportingHCC NEW - VO portfolio for HCC
  • May results
    facility_success_cumulative_smry.png
  • BU - just haven't got to it yet; not sure about
  • UTA_SWT2 - plan to enable after cluster is
  • WT2 - waiting for Dan Fraser. Need to follow-up.

Operations overview: Production and Analysis (Kaushik)

  • Production reference:
  • Analysis reference:
  • last meeting(s):
    • Sites should keep an eye on PRODDISK
    • mc10c finishing
    • mc11 coming next week - simulation of all samples - so expect no downtimes
    • user analysis is starting to pick back up
    • PD2P? discussion - T2's not getting as much data as had been hoped.
    • ddm us cloud squad list membership: wensheng, armen, hiro. Armen will be in charge of assigning problems. Kaushik will add some people that attend the Tuesday DDM meeting.
    • Wei is asking about the number of tickets - there is GGUS, OSG, RT. Will take centrally at BNL.
  • this week:
    • Now have an analysis backlog. Full and IO heavy. proddisks are filling - watch.
    • mc10c still going, and reprocessing. and there are some new requests. there is pileup going on as well.
    • Awaiting mc11 G4 task.
    • New site services are coming online - interest in DATADISK. (T2's presently can pull data from BNL only, therefore not getting some analysis jobs at T2s)
    • Once removed, T2s will be able to get data from outside the US
    • Starting tomorrow, we might see more analysis at the US

Data Management and Storage Validation (Armen)

Shifters report (Mark)

  • Reference
  • last meeting: Operations summary:
    Yuri's summary from the weekly ADCoS meeting:
    http://www-hep.uta.edu/~sosebee/ADCoS/ADCoS-status-summary-5_9_2011.html
    
    1)  5/4: File transfer failures at SMU with errors like "[GENERAL_FAILURE] Error:/bin/mkdir: cannot create directory ..... Permission denied'.  Issue resolved by 
    Justin (site admin) - ggus 70279 closed, eLog 25052.
    2)  5/4: SE problem at BNL (file transfer failures with SRM timeouts) - issues resolved by rebooting a server.  ggus 70283 closed, eLog 25056/60.
    3)  5/4 - 5/5: AGLT2 - Initially file transfers failures with "PSQLException: ERROR: could not access status of transaction 0; Detail: Could not write to file 
    "pg_subtrans/1F9B" at offset 188416: No space left on device."  From Shawn: The postgresql partition hosting the dCache billingdb has filled. We are cleaning it.  
    Later in the day job were failing due to a local user filling the /tmp partition on some WN's.  User contacted - issue resolved.  ggus 70251 closed, eLog 25071.
    4)  5/6 p.m. - 5/9 a.m. - SLAC power outage - from Wei: This is a scheduled power outage to bring additional power to SLAC computer center. During the outage, 
    all ATLAS resource at SLAC, including those belonging to SLAC ATLAS department will be unavailable.  Work completed as of ~1:00 p.m. PST.  eLog 25125.  
    https://savannah.cern.ch/support/index.php?120808.
    5)  5/8 - 5/9: AGLT2 - file transfer failures, due to networking issue at the MSU site.  Queues set off-line for a period of time
    until the problem was resolved.  Once networking was restored test jobs submitted, completed successfully - whitelisted in DDM, queues back on-line.  
    ggus 70361 / RT 19968 closed, eLog 25210.
    6)  5/10: BNL - network maintenance (8:00 a.m. EDT => 12:00 noon) completed successfully.  eLog 25196.
    7)  5/10 a.m. - from Saul at NET2: We had about 500 jobs fail at BU_ATLAS_Tier2o last night due to a bad node.   It's now off-line.  Later in the evening, from John: 
    We're swapping around some internal disk behind atlasproddisk, and we're draining the queues so that we can make the final switchover tomorrow morning after 
    the sites have quiesced.  Panda queues set to 'brokeroff'.
    8)  5/11: WISC DDM failures.  Blacklisted in DDM: https://savannah.cern.ch/support/index.php?120901.  ggus 70467.  Issue is a cooling system problem in a data center.  
    (Also, there seem to be some older, still open Savannah tickets related to DDM errors at the site?)
    
    Follow-ups from earlier reports:
    (i)  4/8: NERSC - file transfer errors.  See ggus 69526 (in-progress), eLog 24176.
    Update 4/19: some progress has been made on understanding the issue(s) - will close this ticket once it appears everything is working correctly.
    (ii)  4/8: OU_OSCER_ATLAS - still see intermittent job failures with segfault errors.  Site was set off-line 4/11 due to a spike in the failure rate.  Discussed in: 
    https://savannah.cern.ch/support/?120307 (site exclusion), ggus 69558 / RT 19757, eLog 24133/92, https://savannah.cern.ch/bugs/index.php?79656.
    (iii)  5/2: UTD_HOTDISK file transfer errors ("failed to contact on remote SRM [httpg://fester.utdallas.edu:8446/srm/v2/server]").  From Joe: Hardware failure on our 
    system disk. Currently running with a spare having out of date certificates. Our sys-admin is working on the problem.  ggus 70196 in-progress, eLog 24971.
    Update 5/10: Site admin reported that UTD was ready for new test jobs, but they failed with "Required CMTCONFIG (i686-slc5-gcc43-opt) incompatible with that of 
    local system (local cmtconfig not set)" (evgen jobs) and missing input dataset (G4sim).  Under investigation.   https://savannah.cern.ch/support/?120588, eLog 25250.
    (iv)  5/4 early a.m.: SWT2_CPB - problem with the NIC (cooling fan) in a dataserver took the host off-line.  Problem should now be fixed.  ggus 70266 / RT 19949 will 
    be closed once we verify transfers are succeeding.  eLog 25046.
    Update 5/4 p.m.: successful transfers for several hours after the hardware fix - ggus / RT tickets closed, eLog 25046.
    (v)  5/4: OU_OCHEP_SWT2_PRODDISK - file transfer failures due to checksum errors ("[INTERNAL_ERROR] Destination file/user checksum mismatch]").  Horst & Hiro 
    are investigating.  https://savannah.cern.ch/bugs/index.php?81834, eLog 25039.
    
    • No major issues this week
    • Opportunistic OSCER site at OU - it has been offline for quite a while. Will turn back on.
    • ddm-ops mailing list
  • this meeting: Operations summary:
    Yuri's summary from the weekly ADCoS meeting:
    http://www-hep.uta.edu/~sosebee/ADCoS/ADCoS-status-summary-5_16_2011.html
    
    1)  5/13: From Saul & John at NET2: 1000-2000 failing jobs with get errors at HU_ATLAS_Tier2 due to a network problem.
    We're working on it and will reduce the number of running jobs in the mean time.  ggus 70598 was opened during this period (lsm timeout errors).  Problem 
    now resolved - ggus ticket closed on 5/16.  eLog 25367.
    2)  5/14: The queue BNL_ATLAS_2 was set on-line temporarily to enable the processing of some high priority jobs requiring > 2 GB of memory.  (This queue is 
    normally off-line.)  eLog 25387.
    3)  5/14: MWT2_UC - problem with a storage server (failed disk).  Queues set off-line while the disk was being replaced.  Problem fixed, queues back on-line 
    as of ~2:30 p.m. CST.
    4)  5/16: SLAC - failed functional test transfers ("gridftp_copy_wait: Connection timed out").  From Wei: The URL copy timeout parameter in the FTS channel 
    STAR-SERV04SRM (to SLAC) is set to 800 sec, much short than almost all other channels. This is the cause of all failures I checked. i will set this parameter to 
    something longer.  ggus 70641 closed, eLog 25455.
    5)  5/17: Previously announced ADCR database intervention canceled.  See eLog 25463 (and the message thread therein).
    6)  5/17: SWT2_CPB maintenance outage for cluster software updates, reposition a couple of racks, etc.  Expect to complete by late afternoon/ early evening 
    5/18.  eLog 25474, https://savannah.cern.ch/support/index.php?121013.
    7)  5/17: AGLT2_USERDISK to MAIGRID_LOCALGROUPDISK file transfer failures ("globus_ftp_client: Connection timed out").  Appears to be a network 
    routing problem between the sites.  ggus 70671 in-progress, eLOg 25480.
    8)  5/18 early a.m.: ADCR database hardware problem (disk failure).  For now db admins have switched over to a back-up instance of the database.  
    See: https://atlas-logbook.cern.ch/elog/ATLAS+Computer+Operations+Logbook/25499.
    
    Follow-ups from earlier reports:
    
    (i)  4/8: NERSC - file transfer errors.  See ggus 69526 (in-progress), eLog 24176.
    Update 4/19: some progress has been made on understanding the issue(s) - will close this ticket once it appears everything is working correctly.
    (ii)  4/8: OU_OSCER_ATLAS - still see intermittent job failures with segfault errors.  Site was set off-line 4/11 due to a spike in the failure rate.  Discussed in: 
    https://savannah.cern.ch/support/?120307 (site exclusion), ggus 69558 / RT 19757, eLog 24133/92, https://savannah.cern.ch/bugs/index.php?79656.
    Update 5/16: No conclusive understanding of the seg fault job failures.  Decided to set the site back on-line (5/16) to see if the problem persists.  Awaiting 
    new results (so far no jobs have run at the site).
    (iii)  5/2: UTD_HOTDISK file transfer errors ("failed to contact on remote SRM [httpg://fester.utdallas.edu:8446/srm/v2/server]").  From Joe: Hardware failure on 
    our system disk. Currently running with a spare having out of date certificates. Our sys-admin is working on the problem.  ggus 70196 in-progress, eLog 24971.
    Update 5/10: Site admin reported that UTD was ready for new test jobs, but they failed with "Required CMTCONFIG (i686-slc5-gcc43-opt) incompatible with 
    that of local system (local cmtconfig not set)" (evgen jobs) and missing input dataset (G4sim).  Under investigation.   https://savannah.cern.ch/support/?120588, 
    eLog 25250.
    Update 5/14: Issue with some problematic atlas s/w releases has been resolved.  ggus 70196 closed, queues set back on-line.  eLog 25377.
    (iv)  5/4: OU_OCHEP_SWT2_PRODDISK - file transfer failures due to checksum errors ("[INTERNAL_ERROR] Destination file/user checksum mismatch]").  
    Horst & Hiro are investigating.  https://savannah.cern.ch/bugs/index.php?81834, eLog 25039.
    Update 5/13: No more failures after 2011-05-05 06:08:45 - Savannah ticket closed.
    (v)  5/10 a.m. - from Saul at NET2: We had about 500 jobs fail at BU_ATLAS_Tier2o last night due to a bad node.   It's now off-line.  Later in the evening, from 
    John: We're swapping around some internal disk behind atlasproddisk, and we're draining the queues so that we can make the final switchover tomorrow 
    morning after the sites have quiesced.  Panda queues set to 'brokeroff'.
    Update 5/17: queues back on-line.
    (vi)  5/11: WISC DDM failures.  Blacklisted in DDM: https://savannah.cern.ch/support/index.php?120901.  ggus 70467.  Issue is a cooling system problem in a 
    data center.  (Also, there seem to be some older, still open Savannah tickets related to DDM errors at the site?)
    Update 5/12: From Wen - The cooling problem is fixed. Now the data servers are back.  ggus 70467 closed.
    Update 5/14: Cooling problem recurred.  Issue resolved, ggus 70599 closed.
    
    • DB outtage ADCR - affecting panda. Moved to backup instance. Recovering now.
    • Pilot update
    • OU OSCER site - working with Horst. Seg fault errors that looked like a site issue. No conclusive understanding yet (now there's a brokerage issue).
    • Many tickets closed

DDM Operations (Hiro)

Throughput and Networking (Shawn)

  • NetworkMonitoring
  • https://www.usatlas.bnl.gov/dq2/throughput
  • Now there is FTS logging to the DQ2 log page at: http://www.usatlas.bnl.gov/dq2log/dq2log (type in 'fts' and 'id' in the box and search).
  • last week:
    • Bi-weekly meeting yesterday. What we're going to do this quarter - seen site certification matrix.
    • quarterly cleaning of personar database
    • mwt2-iu and aglt2 problem seemed to have cleared: latency symmetric, throughput back up. Still an issue at OU - Jason investigating in-depth.
    • site related things in the other business.
    • modular version of perfsonar dashboard work on-going by Tomaz. Probably create hierarchy of pages, then go down to cloud-level.
    • ADC development monitoring presentation - goal by next software week to get Italian cloud instrumented.
    • LHCONE meeting tomorrow in Washington hosted by I2.
  • this week:
    • No throughput group meeting this week.
    • LHCONE - Jason Z
      • notes being finalized
      • 25 in person, 30 remote. Good participation and discussion of data transfer problems.
      • Connectivity issues in US and Europe were discussed.
      • June 13 - Tier 2 meeting at LHCOPN meeting in DC
      • There will be some US ATLAS participation
    • Shawn - concept is essentially traffic separation, and engineering, and cost savings
    • How best to do this? Which infrastructure is appropriate.
    • There is an LHCONE architecture document available.
    • Michael - there is an intention to keep the LHCOPN and LHCONE infrastructures separate

HTPC configuration for AthenaMP testing (Horst, Dave)

last week
  • Horst - trying to enable the OU ITB site
  • Dave - Doug sent jobs to Illinois - there seemed to be a case-sensitivity issue. Also waiting for a new release of AthenaMP.
this week
  • Horst - still working on queue config for ITB, getting ready for submission. Doug's jobs to OSCER keep failing with seg faults.
  • Dave - all ready at Illinois. Doug has submitted a lot of test jobs okay. 16.6.5.5. version had bug producing corrupted ESD files. Doug on vacation. Progressing.

CVMFS

See TestingCVMFS

last week:

  • CVMFS Tests at MWT2
  • Site status: AGLT2 - when nodes are re-installed; working at Illinois;
  • Alden - what about BDII publishing? How should the brokerage respond? Will require changes by Alden and Tadashi to work out brokerage. Torre: we can deal with this at the adc-panda level.
  • Doug had a question to John: should reverse proxies be put on the BNL site
  • Michael: FNAL has switched to CVMFS for software distribution and have been experienced problems, and backed out. Happened during normal production. Suggests continue running stress tests with large numbers of jobs starting. We'll need to be careful.
this week:
  • At MWT2 - in production; setting up monitoring of squid. And at Illinois - switched to production server as advised by John. Both analysis and production.
  • Ordering - BNL, CERN
  • John DeStefano - looking to put together a test to put more load on CVMFS, not yet thoroughly tested.
  • cvms-talk will tell you the server in use and proxy
  • There still is some traffic on the testbed instance.
  • Doug - final system is in test mode at CERN-PH. Its fully installed and tested, in the final configuration
    • Needs a discussion with SIT and ADC migration team
    • CERN IT has box for conditions database.
    • End sites will not see the switchover.
    • Local configuration will need to change slightly - line that says which repos to mount. atlas-cern.ch, atlas-condb.cern.ch. symlink within repo that points to cond data.
    • Structure is described into the Doug's talk at last software week
    • Nearly all releases are in the "new area"
    • Transition plan - next week or two
    • Cloud squads polled
    • Site configuration change announced
    • ADC decision
    • Notification to clouds to precision config change and dates
  • New configurations are at the stratum server at CERN - so probably should limit testing. Illinois will work.
  • Conditions data test as well

Federated Xrootd at sites: Tier 3 (Doug), Tier 2 (Charles)

last week(s):
  • See Charles' email of this morning. Need sites to update plugin. Once updated will start second round of testing.
  • Doug - working with Andy and Wei.
this week:
  • Charles continuing to investigate performance in the xroot client across wide area
  • Looking at the performance tuning.
  • In xrootd its more complicated to control the parameters, requires later version of root.
  • Wei - N2N? conversion module is working SLAC.
  • Doug:
    • global redirector - Hiro will bring it up within the week; new hardware in a couple weeks.
    • There is a complicated config rd file - sent by Wei
    • xrootd configuration whereas redirector can also act as proxy. The proxy machine talks to the global redirector. Hiro: 3rd party transfer? Doug: Andy says it not in the mainline xrootd source.
    • Andy has made some changes to fix init.d and xrootd init. There have been several bug fixes of late.
    • We're at 3.0.3.
    • New xrootdfs fixes 3.0.4rc1.
    • Andy also has fixes for locking proxy.
    • Asked Andy and Dirk about decision for xrootd yum repo
    • OSG will provide libraries for gridftp server; Tanya has been following all the email regarding the release caches.
    • There are two "official" repos for xrootd
  • Wei: 3.0.4rc1 rpm now available from Lukaz. rpms are in website at CERN.
  • Hiro has provided a dq2 plugin called xrdcp - allows copy between xrootd servers. According to a global namespace.
    • Testing at BNL Tier3
    • Will contribute to dq2 repo at CERN
    • Doug is asking for a sub-release; Hiro
  • Doug and Andy - working on security bits. Gerry Ganis has put in gsi authentication, and Brian Bockelman. Andy will write a plugin that can be tested. Servers have a server certificate with "atlas-xrootd" in the service name.

Tier 3 Integration Program (Doug Benjamin)

Tier 3 References:
  • The link to ATLAS T3 working groups Twikis are here
  • T3g Setup guide is here
  • Users' guide to T3g is here
  • US ATLAS Tier3 RT Tickets

last week(s):

  • Setup of CVMFS with ATLAS releases setup to 70%. Conditions data has been put there.
  • Need a Tier 3 production site policy document. - will take up with management.
  • Looking at pre-release testing of xrootdfs.

this week:

  • None

Tier 3GS site reports (Doug Benjamin, Joe, AK, Taeksu)

last week:
  • UTD: Joe: problems on gatekeeper node, required disk swaps.

this week:

  • AK - has been working with Jason Zurawski - will be providing a report.
  • Michael is developing a US ATLAS policy for Tier3GS
  • CVMFS has been installed at Bellarmine by Horst

Site news and issues (all sites)

  • T1:
    • last week:
      • GLExec autopilot factory work going on, will go into production soon
      • Xrootd Global redirector to be moved to BNL, Hiro pursuing this
      • Chimera migration for dcache at BNL
      • Hiro testing direct-access method from dcache with new panda queue
      • Federated ID management with CERN using shibboleth with trust relationships between BNL and CERN
    • this week:
      • Chimera migration tools under study
      • Networking changes to avoid multiple hops
      • Hiro's plugin for direct xrootd transfers
      • More scalable access to mass data - discussion w/ blue arc pNFS-based solution ongoing; this has been delayed until August
      • Order for 150 westmere-based wn's via UCI - unexplained delays

  • AGLT2:
    • last week(s):
      • LFC number of threads increased from 20 to 99
      • BillingDB? filled up a partition on dcache, now cleanup is automated so as this will no longer occur
      • Meeting with Dell next week with regard to SSDs, hope for better pricing
      • Met with MWT2 to discuss LSM and pcache
      • Revisiting testing of direct-access methods
      • Plan to deploy CVMFS with new rocks build, likely complete today, rolling re-builds will being later in the week
      • Testing NexSAN, doing iozone testing, issues with 60-unit rack-mounting, improvement over Dell in Density and Performance
    • this week:
      • Met w/ Dell - possible future options with SSDs. 3 TB disks, on portal in June; future systems, timescale very long though. New Athon systems in August ($/compute good)
      • NexSAN - tests completed, perf not as good. Older satabeast used. 60 disk 4U unit tested. Size issue - does not fit into existing racks; density is good.
      • Rocks update - Bob - upgraded Condor 7.6 issues w/ hierarchical acct issues; negotiator process would crash under certain conditions. Settled on build, ready to go. 5 racks at MSU rebuilt. Will be putting into Condor pool.
      • Running at half capacity.
      • Tom - working on builds - had a network "disaster" caused by a single back NIC. Low level of packet loss related to a NIC pre-fail.

  • NET2:
    • last week(s):
      • Relatively smooth operations
      • Tier3 work: operational now
      • Focused on local IO ramp-up: joining GPFS volumes complete, rebalancing of files, good performance from GPFS
      • Harvard directly mounting GPFS with clustered NFS with local site mover
      • Getting another 10Gb link from campus folks
      • Adding GridFTP? instances at BU
      • Upgrading OSG, upgrading to BestMan2, moving to CVMFS
      • Purchasing more storage, 2 rack worth: ~300TB per rack, ending up at 1.5PB by July
    • this week:
      • IO progress - 1.6 GB/s between BU and HU, filling the 10G link. Setting up a second link. Seeing 750 MB/s.
      • Now ready to ramp up analysis at HU,500 jobs. (staging in presently)
      • Smooth operating in the past week, but lots on the agenda. Checksum reporting issue.

  • MWT2:
    • last week:
      • UC: Completed our move, moved to CVMFS
      • IU:
      • Illinois:
    • this week:
      • UC: dcache update to address ;
      • Sarah - MWT2 queue development
      • Illinois - CVMFS, HTPC

  • SWT2 (UTA):
    • last week:
      • Looking to CVMFS
      • Storage server NIC went away, caused some problems but back fine now
      • Partial outage on sunday due to 8 hour power outage, generator should come on without issues, but 2 racks of SWT2_CPB workers will be affected
      • Rolled back BestMan2? version, ran on a second node with no problems of new version, in the mean time newer version was released, will test soon and then take a downtime to move to latest BestMan2? and newer OSG stack, also spin up another 200TB of disk
    • this week:
      • Outtage - software upgrade on CPB, some rack re-arrangement. CE tweaks. Should back up next week.

  • SWT2 (OU):
    • last week:
      • Glitch with a release getting corrupted, deleted and re-installed fixed the problem
      • CVMFS testing ongoing, hope to move to CVMFS next week
      • Working on MP cluster/queue at OU
    • this week:
      • All is well. Pilots on MP queue.

  • WT2:
    • last week(s):
      • Latest BestMan upgraded, working and fixes a number of issues
      • Alex will release latest BestMan to OSG by the end of the week, including plugin to dynamically add/remove GridFTP? servers
      • 2-3 day power outage starting Friday afternoon through Sunday or Monday to bring more power to the building
      • After power outage, starting OS installation of new compute nodes once power is delivered to new nodes
    • this week:
      • All is fine.

Carryover issues (any updates?)

Python + LFC bindings, clients (Charles)

last week(s):
  • wlcg-client-lite to be deprecated
  • Still waiting on VDT for wlcg-client, wn-client
  • Question - could CVMFS be used to distribute software more broadly? This would require some serious study.
this week:

WLCG accounting (Karthik)

last week:
  • Sites reporting are within 5%.
  • Won't expect progress on hyperthreading report
  • UTA - Gratia numbers are coming low, tracking down a systemic reporting issue
  • NET2 - will look into this, also making a comprehensive comparison with WLCG.
  • Michael - WLCG has defined efficiency figures for T1 and T2 - it was set to 60% years ago. Discussion yesterday at WLCG MB meeting - proposal to increase to 70%.
this week:

AOB

last week this week


-- RobertGardner - 17 May 2011

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments


png facility_success_cumulative_smry.png (53.2K) | RobertGardner, 18 May 2011 - 10:57 |
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback