r2 - 23 Sep 2011 - 15:29:47 - RobertGardnerYou are here: TWiki >  Admins Web > MinutesFedXrootdSep23



  • Bi-weekly US ATLAS Federated Xrootd meeting
  • Friday, 2-3pm Eastern
  • USA Toll-Free: (877)336-1839
  • USA Caller Paid/International Toll : (636)651-0008
  • ACCESS CODE: 3444755

  • Attending: Hiro, Andy, Bob, Shawn, Rob, Wei, Saul, Ofer, Wei, Patrick, Horst
  • Apologies: Sarah

Following up on Open Isssues from Workshop


  • See further https://twiki.cern.ch/twiki/bin/viewauth/Atlas/AtlasXrootdSystems
  • We need it in order to pass opaque info such as GUID. We discussed making GFN symlinks but some sites think the amount of data they have is too large, and maintenance is also an issue.
  • A related one is the current N2N module. How easy it is to make it work with N2N2? We need to extend it to suppose GUID. Many of use have read the code. We need to model to support it. A community model or a ownership model? For latter, who will own it?
  • Andy: passing opaque info just isn't going to work - because the cms doesn't. This would be a substantial arch change, not written into protocol. Even if it did, would you get it consistently?
  • Embed guid into GFN? becomes part of? Can you pass it along invisibly? Only for FRM. Store filename without GUID. Would require everyone to have a N2N? translation.
  • We need to detail the scenarios.
  • Wei: make symbolic links for all the files. Shawn: managing this could be burdensome.
  • Dedicated database? LFC? Does not map global namespace to guid. How different? Hiro: its the same.
  • Hiro: LFC can be used as a global namespace, but not gauranteed. Eg. user datasets. /grid/atlas/dq2/users.. Some Tier 2's.
  • Is the current N2N the bottleneck?
  • Chicago can keep the source in CVS for now - Wei can compile

Intergrated checksumming

  • It is needed while N2N(2) is required. There are discussion of caching N2N(2) results for various future use.
  • A related issue is how to do checksum on Xrootd native proxy (or proxy consists if a xrootd on top of xrootdfs). Andy thinks the former can be addressed by a plugin. Will get more detail.
  • Andy:
    • done from the xrootd side; works across proxy server; included in 3.1
    • query checksum will return correctly
    • Will need a wrapper around xrdcp to do the checking
    • typically stored with the file in extended attributes (needs configuration directive); ext4 always on; xfs; ext3 needs to be checked; gpfs? - probably does.
    • Choice: you can do it integrated, done by server. Or, configure to do a call out (Saul prefers this)
  • How does this work for the 'integrated' sites (eg Tier 1, Tier 2). How to handle this? Callout does not go through N2N? . Want to call out to dCache.
  • Three ways: * Script callout before N2N? * Integrated, done by server * Plugin - lookup from the backend filesystem * Hiro will try this last method; Also notes checksum can be gotten from dq2-get with xprep; FRM stage-in script will need a bit of work. * Will require xrootd 3.1 for files not in DDM * Local root path: prefix to the global name space. With xrootd release 3.0.5 client would need to know about this.

Proxy server

  • Reading from Native Proxy triggers stage-in (reading from global redirector will not).
  • Reading directly from a proxy server will trigger stage-ins. In FRM, need pid information - so knows origin of request.
  • Andy: could config proxy server to return 'file not found'; mod to plugin. Will put this into 3.1.

3.1 release schedule

  • mid-October
  • Root team decided to decouple xrootd from root package. Will have to go to another rpm, maintained by WLCG.

sss module development

  • Issues in "sss": private/public address NICs; Double free() in sss (Wei got a core dump). The multiple home/NIC issue is not limited to using the "sss" module. But as a broader issue, it may only be addressed after 3.1.0 release.
  • Allows xrootdfs or proxy to tell server the actual server; knows uid on xrootdfs host. Checks authorization.
  • Debugged. Wei will change the code. Will come in 3.1.

xprep warnings

  • Does xprep (and dq2-get -xprep) give a warning if site's xrootd cluster is not configured for xprep. At least we need to give sites enough warning so that they don't miss this issue during configuration.
  • Protocol is best effort
  • Where should the warning go? You do it against the local redirector. If local cluster isn't config'd, then what? User isn't known.
  • Andy will look into this. In the config file this is exported.
  • Wei will track.

Logging, alerts

  • Notifying user of completeness or failure of dq2-get -xprep. It seems we favor letting users to check if files are ready via a dq2-ls against local file system/storage. As Doug pointed out, the global file name dq2-ls produced isn't quite identitical to the GFN we expected (and dq2-get produced?). In this case, who is in the best position to push this to be fixed (with ADC)
  • A nice to have - Andy wil investigate how this might be. Post 3.1 likely.
  • Implement in dq2-ls; Hiro.

FRM script standardization

  • Standardize FRM scripts, including authorization, GUID passing, checksum validation and retries.
  • A few flavors possible.
  • Setup a twiki page just for this.

cmdsd+dcache/xrootd door

  • An updated CMSD that will work with the native dCache/xrootd door (Andy?)
  • A caching mechanism to allow the lookup done by the CMSD N2N[2?] plugin to be useable by the xrootd door (either dCache or Xrootd version) (Andy/Hiro/Wei/?)
  • Redirect to the xrootd-dcache door; will do the lookup and do memcached. cmsd will need the N2N plugin. N2N? must write to something the dCache sites can read.
  • Hiro will look into this; not critical path.

Authorization plugin

  • A "authorization" plugin for the dCache/xrootd door which uses the cached GFN->LFN information to correctly respond to GFN requests (Hiro/Shawn/?)

ANALY queue

  • Setting up a panda analy queue to run tests against the federation, and therefore using GPN from within the pilot/lsm, testing both direct access and stage-in with HC testing.
  • Rob will follow-up.
  • look up dbrelease file from federation.

DP3P example

  • Get Shuwe's top DP3D? example into HC (Doug?)

Sharing Configurations

New benchmarking results

  • XRD-LFC N2N overhead (Sarah):
    I'm trying to get a measurement of the impact on performance of the N2N
    plugin ( C++-code written by Charles which maps logical file names to
    physical ).
    For the first time, I tried testing transfers which used the LFN vs
    transfers which use the PFN.  When the N2N plugin is passed a LFN, it
    does a series of lookups against LFC to find the associated PFN and
    returns that. When N2N is passed a PFN, it recognizes the prefix (pnfs)
    and returns the path as is.
    So, this test measures the performance impact of the LFC lookups, but
    not that of calling N2N itself.
    I identified 100 files in LFC which have size 0.  These were chosen so
    that variations in transfer time would not skew the results.  I then
    took the first 50 and timed transfers of them using LFN, then the next
    50 using PFN.
    Here are the results, in seconds:
    N=50 min=0.31 avg=0.35 max=0.51 std.dev=0.04
    N=50 min=0.01 avg=0.05 max=0.08 std.dev=0.01
    So, it looks like we get a fairly consistent penalty of .3 seconds per
  • Should only become significant for many many files ~ 1000.


Ganglia monitoring information

  • Note from Artem: Hello Robert, We've managed to do some progress since our previous talk. We build rpms, here is link to repo: http://t3mon-build.cern.ch/t3mon/, we have rebuilded versions of gangla, gweb in it. Ganglia people've issued ganglia 3.2 and new ganglia web (gweb), all our stuff was rechecked and works with this new software. It's better to install ganglia from our repo, instructions are here: https://svnweb.cern.ch/trac/t3mon/wiki/T3MONHome. About xrootd: we have created daemonized version of xrootd summary to ganglia script. It's available at the moment at https://svnweb.cern.ch/trac/t3mon/wiki/xRootdAndGanglia, it sends xrootd summary metrics (http://xrootd.slac.stanford.edu/doc/prod/xrd_monitoring.htm#_Toc235610398) to ganglia web interface. Also we have application which works with xrootd summary stream but at the moment we're not sure how it's better to present fetched data. We collect there user activity and accessed files, all within the site. Last week we installed one more xrd development cluster and we're going to test if it possible to get and then split information about file transfers between sites/within one site. WBR Artem

-- RobertGardner - 23 Sep 2011

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback