r1 - 06 Mar 2012 - 20:45:36 - WeiYangYou are here: TWiki >  Admins Web > MinutesFedXrootdMar6



  • Attending:
  • Apologies:

Face-to-face meeting:

April 11-12, 2012, Gleacher Center, University of Chicago (downtown). Recommended hotel: http://cwp.marriott.com/chifd/uchicago/

FAX Status Dashboard


Meeting business

  • Twiki documentation locations
    • Some have difficulty to access certain CERN twiki, unknown why. Suggest to put at BNL twiki with link at CERN twiki to BNL (http, not https).
    • not done yet
    • RG will follow-up

Xrootd release 3.1.0 deployment

Summary of previous meetings:
  • Xrootd releases come out with some functional validation by stakeholders and large sites. But lack a formal release validation process.
  • CMS abandoned Dcap plug-in for Xrootd OFS. They use dCache xrootd door directly or Xrootd overlap dCache.
  • Known issues:
    • RPM updates overwrite /etc/init.d/{xrootd,cmsd} which have LFC environment setup. Those setup should go to /etc/system/xrootd which survives rpm updates. Patrick will test it.
  • SWT2: N2N? crashing issue is understood (conflict of signal usage in regular xrootd and Globus). Solution is either use a proxy, or regular xrootd with "async off".
this meeting:
  • Xrootd 3.1.1 is ready for deployment at all sites.

ANALY queue

last meeting:
  • Subscribed to SLAC, and UC. will do SWT2_CPB. Subscription to SLAC is jammed by 2000+ datasets from other users.

  • Working on TTreeCache settings for jobs sent to SWT2, SLAC and UC
  • Working with a few datasets (1.1M events, 120 files)
  • Results:
    job at UC, data at xrd.mwt2.org (local):
    real	10m35.387s
    user	4m39.398s
    sys	0m8.639s
    job at UC, data at atl-prod09.slac.stanford.edu RTT=52.1 ms:
    real	130m52.543s
    user	4m23.163s
    sys	0m8.343s
  • Trying new settings suggested by Jack Cranshaw.
this meeting:
  • Part of the slowness with SLAC is understood --- SLAC's storage disabled file system level prefetching (how much was this contributed to the degraded performance?)


  • Andy: code is ready in 3.1.0. Wei (and Doug?) will test it?
this meeting
  • X509 module that checks VO attributes works in 3.1.1. Doug: this is probably good enough (and allow the cloud setup to avoid a grid infrastructure). Enhancement: VO info is not validated using VO public key.

cmdsd+dcache/xrootd door

last meeting:

  • Sarah: dCache xrootd door's performance similar to dCap door.
  • A "authorization" plugin for the dCache/xrootd door which uses the cached GFN->LFN information to correctly respond to GFN requests (Hiro/Shawn/?). Hiro, will work on a Java API for LFC first.
this meeting:

Sharing Configurations

last meeting: this meeting:

Detailed monitoring from UCSD /CMS

  • Discussions with Matevz Tadel (USCMS, UCSD) at Lyon
  • Considering deploying an instance at UC; if so would ask sites to publish information to it.
  • Matevz setup a system at SLAC. Currently is customized to provide real time info: open files, src domain, dst domain, bytes read. More customization can be done. Need to decide what to do with file-close info (a rich source of info that is currently dumped to a flat file). CMS feeds file-close into to Gratia.
this meeting
  • See http://atl-prod05.slac.stanford.edu:4242 for real time info
  • Can all sites add the following line to border data servers (or proxy data servers) configuration file (/etc/xrootd/xrootd-clustered.cfg)
xrootd.monitor all auth flush io 30s mbuff 1472 window 5s dest files io info user atl-prod05.slac.stanford.edu:9930

Ganglia monitoring information

last meeting:

  • Note from Artem: Hello Robert, We've managed to do some progress since our previous talk. We build rpms, here is link to repo: http://t3mon-build.cern.ch/t3mon/, we have rebuilded versions of gangla, gweb in it. Ganglia people've issued ganglia 3.2 and new ganglia web (gweb), all our stuff was rechecked and works with this new software. It's better to install ganglia from our repo, instructions are here: https://svnweb.cern.ch/trac/t3mon/wiki/T3MONHome. About xrootd: we have created daemonized version of xrootd summary to ganglia script. It's available at the moment at https://svnweb.cern.ch/trac/t3mon/wiki/xRootdAndGanglia, it sends xrootd summary metrics (http://xrootd.slac.stanford.edu/doc/prod/xrd_monitoring.htm#_Toc235610398) to ganglia web interface. Also we have application which works with xrootd summary stream but at the moment we're not sure how it's better to present fetched data. We collect there user activity and accessed files, all within the site. Last week we installed one more xrd development cluster and we're going to test if it possible to get and then split information about file transfers between sites/within one site. WBR Artem
  • Deployed at BNL, works.
  • Anyone tried this out in the past week? Would be good to try this out before software week to provide feedback.

this meeting:

Performance Studying

  • Network latency related performance turning. There is a US ATLAS working group looking at ATLAS code for possible improvement. Doug is in the group.
  • Analysis IO performance Developer Summary Meeting Dec/15: https://indico.cern.ch/conferenceDisplay.py?confId=166930
  • Should send a request to root IO group asking for a self-contain example to test at FAX, should find out what matrix FAX group want to see from ROOT IO group.
this meeting:

dq2-ls-global and dq2-list-files-global

last meeting:
  • want dq2 client tools that can list files in a dataset in GFN (or local redirector); and check against their existence in FAX or local site.
  • Hiro's poor man's version can be found at http://www.usatlas.bnl.gov/~hiroito/xrootd/dq2/; work with containers.
  • RWG - I am using this for expanding tests across datasets - works great.
  • Hiro will find out who is in charge of the dq2-client
  • Will be available in the next dq2-client release.
  • Available in the latest dq2-client release
this meeting:

D3PD example

last meeting:
  • Get Shuwei's top DP3D? example into HC (Doug?)
  • Doug will follow-up in two weeks to see about getting this into HC, and the workbook updated. Need to drive this with real examples, with updated D3PDs? . So examples need to be updated for Rel 17.
  • Doug: Goal is to get this into HC test, with sites being able to replace input datasets. will be used by sites to compare performance of reading from local and remote storage. will follow up.
  • Non - HC example can be seen here - https://twiki.cern.ch/twiki/bin/view/AtlasProtected/SMWZd3pdExample
The data sets are:

[dbenjamin@atlas28 ~]$ dq2-ls -r user.bdouglas.physics_Egamma.SMWZd3pdExample.NTUP_SMWZ.f406_m991_p716


[dbenjamin@atlas28 ~]$ dq2-ls -r user.bdouglas.physics_Muons.SMWZd3pdExample.NTUP_SMWZ.f406_m991_p716


  • question for next meeting: How will site request to run this type of HC test? How can site change inputs? How to obtain performance matrix such as total time, etc.
  • HC D3PD? examples will be used as a standard performance benchmark
this meeting:


Summary of last week(s)
  • See further https://twiki.cern.ch/twiki/bin/viewauth/Atlas/AtlasXrootdSystems
  • Decided to continue improving the current N2N and leave GUID as a future option. Chicago can keep the source of N2N in CVS for now - Send update to Rob. Wei can compile
  • Doug's use-case - look up files that existed at BNL but N2N can't find it. Hiro: need to change the code slightly - will do. Probably only happens at BNL. Had to do with the way panda outputs to BNL.
  • Complains about possible memory leak in N2N? . Provided to Andy a standalone package for debugging.
  • Hiro: update N2N? for BNL special cases. Doug will test if this can improve hitting rate to near 100%.
  • N2N? crashing issue has a solution. See MinutesFedXrootdJan11#ANALY_queue
  • Fermi-Gamma experiment at SLAC also see Proxy memory footprint grows. Will release memory when there is a period of no activities, will crash if otherwise. Wei will get more info.
  • Debugged N2N? , found no memory leak
  • Changed SLAC's configuration from xrootd native proxy (cluster) to a cluster of regular xrootd with N2N on top of xrootdfs. This allow better observation of where memory grows.
this meeting:
  • Memory still grows in regulate xrootd + N2N. Was this due to caching in N2N?


last meeting:
  • Wei: with 3.1, checksum is working for Xrootd proxy even when N2N is in use. Tested at SLAC at both T2 and T3. Should be straightforward for Posix sites.
  • Not sure about dCache sites. Probably need a plugin for dCache. Callout to figure the checksum from a dCache system. Andy and Hiro will go through this at CERN
  • Wei: Direct reading, dq2-get (-whatever) don't need checksum from remote sites.
  • On-hold
  • rename this item to discuss general issues with checksumming instead of integrated checksumming.
  • Checksumming for native xrootd is basically solved
  • For posix - can adapt
  • For dCache - is there a plugin for checksum? Its there, need to grap.
  • Querying the remote site for checksumming
  • Wrapper script is needed
this meeting:

FRM script standardization

last meetings:
  • Standardize FRM scripts, including authorization, GUID passing, checksum validation and retries.
  • A few flavors possible.
  • Setup a twiki page just for this.

  • Brings up the question again about checking completion of xprep commands. Failures do leave a .failed file. Are there tools to check the frm queues. Can we provide a tool for this?
  • Andy: suggests setting up a webpage to monitor the frm queues. frm_admin command. Hiro wil be looking into this.*
  • a prototype of doing this:
for i in all_your_data_servers; do
    ssh your_dataserver_$i and do the following:
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:xrootd_lib_path
    export PATH=$PATH$xrootd_bin_path
    frm_admin -c your_xrootd_config_file -n your_xrootd_instance_name query xfrq stage lfn qwt 
done | sort -k2 -n -r
this meeting:

-- WeiYang - 07 Mar 2012

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback