r8 - 12 Jul 2012 - 13:57:43 - WeiYangYou are here: TWiki >  Admins Web > XrootdFederationInstallationConfiguration


This page is moved to https://twiki.cern.ch/twiki/bin/viewauth/Atlas/AtlasXrootdSystems


This page describes a project within ATLAS to create a Xrootd-based storage federation infrastructure to improve read-only access to ATLAS data. The Federated ATLAS Xrootd (FAX) provides a single URL for each ATLAS data file. Users can use this URL to access the file using native Xrootd client (xrdcp) or ROOT, regardless of the file's actual geographic location(s).

The setup instructions in this page target Tier 1 and Tier 2s, while also provide very useful reference and background info for Tier 3 users and admins. For instructions of setting up Tier 3 with the federation, please refer to Tier3gXrootdSetup? .

This is work-in-progress, so all details on this page are subject to change.

Global Namespace, LFC

One aspect of this work is to create an "ATLAS global namespace", that is, a uniform way of referring to files. We have a de-facto global namespace, which is file GUIDs, but this is not user-friendly.

A global namespace was proposed based on LFC paths, with some small modifications. dq2-client 1.0 implemented global name space. "dq2-list-files -p" is capable of printing file path using global name space.

For Tier3 sites, the "global namespace" name will be the same as the storage location (also referred to as PDP, "physical dataset path"). Tier1 and Tier2 sites are storing the files with possibly differing path conventions, so the LFC must be consulted to convert the global name into a physical name. In Xrootd, POSIX, dCache based storage, the conversion is handled by a plugin module "XrdOucName2NameLFC.so".

(LFC lookup results are cached, for performance reasons and to reduce load on the LFC. This cache is controllable via the parameters lfc_cache_ttl (default: 2 hours) and lfc_cache_maxsize (default: 500000 entries))

Site Architectures

We established several possible site architectures for sites with POSIX, Xrootd or dCache storage systems. This document will cover solutions described in slide 1,2,4 and 6, which are currently deployed. Architecture in slide 5 is still under development. Architecture in slide 3 is obsolete.


The installation instructions here cover site architectures described in slide 1,2,4,6.

Xrootd RPMs are available for both RHEL5/SLC5 and RHEL6/SLC6 x86_64 systems. We use RHEL5/SLC5 as examples. The RPM based installation will create a user xrootd on your system if does not exist. You can pre-create this user with your preferred uid, gid, etc. Make sure this user have read access to your storage.

Install Xrootd packages

The easiest way to install Xrootd RPMs is via the xrootd.org yum repository:

$ cat /etc/yum.repos.d/xrootd-stable-slc5.repo
name=XRootD Stable repository
baseurl=http://xrootd.org/binaries/stable/slc/5/$basearch http://xrootd.cern.ch/sw/repos/stable/slc/5/$basearch
$ yum install --disablerepo=* --enablerepo=xrootd-stable-slc5 "xrootd-*"

The rpms will place the following files:

  • /etc/xrootd/xrootd-clustered.cfg main xrootd configuration file. You will need to modify this file.
  • /etc/sysconfig/xrootd system level xrootd configuration. Control who runs xrootd (default is user xrootd), what xrootd instances will be started. You normally don't need to change this file (except to use XrdOucName2NameLFC.so for LFC access)
  • /var/log/xrootd/{xrootd,cmsd}.log log files.
  • /etc/init.d/{xrootd,cmsd} Unix init scripts

Additional Instructions for Tier 1/Tier 2 sites

Your site uses a LFC, and your storage path is site specific, recorded in LFC. Users access Xrootd federation using global name. So a path conversion at your site is necessary. The task is handled by the XrdOurName2Name.so plugin (a.k.a. xrd-lfc). The module, include source code and pre-compiled .so for RHEL5/SLC5 x86-64 is available at https://git.racf.bnl.gov/usatlas/cgit/federated_storage/xrootd/tree.

Note: the xrd-lfc plug-in will translate files in global namespace to physical file path via LFC. It can't translate directories in global namespace since LFC doesn't provide directory mapping except for /atlas (which is intercepted by xrd-lfc and map it to the "root=<root-dir>" parameter of xrd-lfc you provided in your xrootd configuration file)

Note: the xrd-lfc plug-in will also take a format /atlas/dq2/any_character_string!GUID=<guid> and return the file associated to the <guid> in LFC.

Provide a grid proxy with ATLAS voms attribute

Since xrd-lfc is a LFC client, it needs a X509 proxy with appropriate VOMS attributes to access the LFC. Currently this is done manually via voms-proxy-init and the resulting x509up_uXXXX file installed with 'XXXX' replaced by the UID of the xrootd user. Using the flags
-valid 96:00
is helpful so that the proxy does not need to be renewed daily. For production, this will need to be automated according to site's preference and policy, one way to do this (but consult your site security requirement please) is to create a long lasting base proxy from a user certificate
grid-proxy-init -old -out $HOME/base_proxy -valid 999999:0
and then use a cron job to create a proxy with ATLAS voms attributes
voms-proxy -voms atlas:/atlas -cert $HOME/base_proxy -key $HOME/base_proxy -valid 48:00
(Hey, find a better place than $HOME/base_proxy!)

Modify startup script

The xrd-lfc plugin will be loaded dynamically by the xrootd and cmsd daemons. They will need access to a grid environment containing liblfc.so, for example the OSG worker-node client package. They also need the LFC_HOST environment variable set. The easiest way to make sure the environment is set up correctly is to modify the /etc/sysconfig/xrootd. This file will be sourced by /etc/init.d/{cmsd,xrootd}. The modification will be preserved during RPM updates:

Modify the scripts /etc/sysconfig/xrootd and add the following line to the end in order to use liblfc.so

. /share/wn-client/setup.sh

(Here /share/wn-client/setup.sh path is an example, it needs to be set appropriately for your site).


Once the installation is completed, it is time to configuration /etc/xrootd/xrootd-clustered.cfg. All configurations below enable a site to join the Xrootd federation's global redirector at BNL: glrd.usatlas.org:1094.

Don't be confused by the port number 1095 in the following configuration files (glrd.usatlas.org:1095). Port 1094 is the default xrootd port users will use to access the Xrootd federation --- users can omit it. Port 1095 is the cmsd port used to glue the federation together, and is invisible to users.

Configuration for backend storage with Xrootd interface

Configuration for Xrootd backend storage and dCache backend storage with Xrootd door are identical (slide 1 and 4). Here we describe a simple configuration with only one gateway machine running xrootd proxy. It is possible to setup a cluster of gateway machines to spread the load.

# glrd.usatlas.org:1095 is the US global redirector. If you are joining a regional redirector, consult your
# regional redirector's admin for host name, and additional xrootd.redirect directive.
all.manager glrd.usatlas.org:1095

# no need to change the following 5 lines
all.export /atlas r/o
all.adminpath /var/run/xrootd
all.pidpath /var/run/xrootd
all.role server
ofs.osslib /usr/lib64/libXrdPss.so

# LFC configuration. Please change for your site
pss.namelib /path/XrdOucName2NameLFC.so root=/<your_local_storage_root_path> match=<your domain> lfc_host=<your LFC host>

# your backend storage's xrootd interface
pss.origin <your local xrootd redirector/dCache xrootd door>:<xrootd port>

... section for X509 security ... see below
... section for monitoring ... see below

The configuration for xrd-lfc is contained on the pss.namelib line.

Configuration parameters for xrd-lfc:

  • lfc_host: Do not use, set via LFC_HOST env. var.
  • lfc_cache_ttl: (Optional)cache time to live, in seconds (default: 7200, 2 hours)
  • lfc_cache_maxsize: (Optional) maximum number of entries in LFC cache (default: 500000)
  • root start of physical filesystem path in an SFN, e.g. /pnfs/domain.edu This must be set if LFC lookups do not return a "bare" filesystem path (prefixes like srm://server:port/manager must be removed and this parameter is used to identify the beginning of the non-prefix component). A query of /atlas by users will result in "root" path being returned.
  • match (Optional) string or comma-separated list of strings. If set, LFC replies will only be considered if they contain at least one of the match strings. This is used to handle shared LFC between sites, or to restrict to particular space tokens.
  • nomatch (Optional) As above, but LFC replies will be rejected if any of the strings matches as a substring. This is used, e.g. to avoid accessing tape files.
  • dcache_pool[s] (dCache bypass-mode only). Path or paths to dCache physical file pools. If this is set, dCache bypass mode will be used, see section "dCache bypass mode" below.
  • force_direct (dCache bypass-mode only). See "dCache bypass mode" below

Configuration for POSIX storage

POSIX backend can be anything NFS like (including Lustre, GPFS, etc). Here is an example of a simple configuration with only one gateway machine running regular xrootd. It is possible to setup a cluster of gateway machines to spread the load.

# glrd.usatlas.org:1095 is the US global redirector. If you are joining a regional redirector, consult your
# regional redirector's admin for host name, and additional xrootd.redirect directive.
all.manager glrd.usatlas.org:1095

# no need to change the following 5 lines
all.export /atlas r/o
all.adminpath /var/run/xrootd
all.pidpath /var/run/xrootd
all.role server
xrootd.async off 

# LFC configuration. Please change for your site
oss.namelib /path/XrdOucName2NameLFC.so root=/<your_local_storage_root_path> match=<your domain> lfc_host=<your LFC host>

... section for X509 security ... see below
... section for monitoring ... see below

This configuration file differ from the one used for backend storage with xrootd interface in that the ofs.osslib and pss.origin directives are removed. pss.namelib is replaced by oss.namelib, and "xrootd.async off" is added to avoid crashing due to conflict of signal usage in Xrootd and Globus libraries.

Configuration for Xrootd overlapping dCache storage (a.k.a. dCache bypass mode)

In addition to reading dCache files using dCache Xrootd door, a configuration is supported where xrootd runs alongside the dCache pool software (slide 6), and reads directly from the dCache pools (This requires a pathname->pnfsid lookup, which is cached alongside the LFC lookup result by xrd-lfc).

To enable this feature, the dcache_pool or dcache_pools parameters of xrd-lfc are used. Patterns are glob-expanded, so for instance a pattern like /dcache/pool*/data is handled correctly. It may be a comma-separated list of such paths.

If a file is found directly in the pool, it will be read from there. If ofs.osslib is set to libXrdDcap.so and the file is not found on the pool, it can still be read using dcap protocol.

However if force_direct is specified, then files will only be served from the paths specified in dcache_pool[s]. The idea is that xrootd is running on all dcache pools, and only the pool which hosts the file will respond to a location query, so files will always be served directly from the pools, bypassing dcap mode entirely. This offers higher performance and incurs less traffic and on the local network. We recommend that force_direct is always specified because libXrdDcap.so is deprecated.

If a site has more than one dCache pool nodes, each pool node will run a xrootd/cmsd instance. An additional machine running a pair of xrootd/cmsd instance is also needed to serve as site's xrootd redirector. This machine doesn't need significant resource. Below is an example configuration file for such a xrootd cluster:

# glrd.usatlas.org:1095 is the US global redirector. If you are joining a regional redirector, consult your 
# regional redirector's admin for host name, and additional xrootd.redirect directive.
all.manager meta glrd.usatlas.org:1095

# no need to change the following 9 lines
all.manager your_redirector.domain:1095
all.export /atlas r/o
all.adminpath /var/run/xrootd
all.pidpath /var/run/xrootd

if your_redirector.doman
    all.role manager
    all.role server

# LFC configuration. Please change for your site
    oss.namelib /path/XrdOucName2NameLFC.so root=/pnfs match=<your domain> lfc_host=<your LFC host> dcache_pools=/dcache/pool*/data force_direct

... section for X509 security ... see below
... section for monitoring ... see below

Enabling X509

The following steps enable X509 GSI security at your site (on data servers/proxy servers, not redirectors)

  • copy your /etc/grid-security/{hostcert.pem, hostkey.pem} to /etc/grid-security/xrd/{xrdcert.pem, xrdkey.pem}
  • Make sure the above two files in /etc/grid-security/xrd are owned by the XROOTD_USER/XROOTD_GROUP listed in /etc/sysconfig/xrootd. The permission of these two files should be 644 and 400 respectively (that is, not 600 for xrdkey.pem)
  • Make sure your proxy, cert-dir, etc are in standard location (/tmp/x509up_u, /etc/grid-security/certificates), If not, define environment variables X509_USER_PROXY, etc. in /etc/sysconfig/xrootd. Make sure CRL gets updated.
  • echo "u * /atlas rl" > /etc/xrootd/auth_file (This authorization file defines access privileges. The example here allows all authenticated users to read)
  • Add the following lines to /etc/xrootd/xrootd-clustered.cfg, replacing the "section for X509 security" above.
 xrootd.seclib /usr/lib64/libXrdSec.so
 sec.protocol /usr/lib64 gsi -crl:3 -moninfo -authzfun:libXrdSecgsiAuthzVO.so -authzfunparms:valido=atlas&vo2grp=OG&vo2usr=atlas04 -gmapopt:10 -gmapto:0
 acc.authdb /etc/xrootd/auth_file
 acc.authrefresh 60
Note on "vo2grp=OG&vo2usr=atlas04": you should adjust vo2grp= and vo2usr= to the appropriate user authorizated by the /etc/xrootd/auth_file above for read access.

Site level workaround of a X509 bug

The current Xrootd release potentially contains a bug that is under investigation. The issue effects a (limited) grid proxy with VOMS attributes that is typically seen in an ATLAS Panda job. We propose to put the following workaround at all Grid sites where ATLAS jobs will access the Xrootd federation. This workaround should go to $ATLAS_LOCAL_AREA/setup.sh.local. For most OSG sites, $ATLAS_LOCAL_AREA = $OSG_APP/atlas_app/local.

export XrdSecGSIUSERPROXY=${X509_USER_PROXY}_XrdProxy
voms-proxy-init –quiet	\ 
     -voms atlas:/atlas	\ 
     -vomses "path-to-a-vomes/vomses”	\ 
     -key $X509_USER_PROXY	\ 
     -cert $X509_USER_PROXY	\ 
     -out $XrdSecGSIUSERPROXY	\

Note: one important thing for this workaround to actually work is that the new VOMS attributes (atlas:/atlas) in the command above must be different from the VOMS attributes in the original proxy pointed by $X509_USER_PROXY. In most Panda job, the proxy in $X509_USER_PROXY has something like atlas:/atlas/Role=production/.... With this workaround, Xrootd client such as xrdcp or ROOT will pick up X509 proxy from environment variable $XrdSecGSIUSERPROXY.

Enabling Xrootd Monitoring

Add the following lines to your configuration file, replacing "section for monitoring" above.

xrootd.monitor all auth flush io 30s ident 5m mbuff 1472 window 5s dest files io info user atl-prod05.slac.stanford.edu:9930
xrd.report atl-prod05.slac.stanford.edu:9931 every 60s all -buff sync
some site reported issues of the UDP packages sent by the "xrd.report" are not received correctly by the collector. This issue is under investigation. A "-buff" is added to avoid this issue.

Monitoring and Site Status Dashboard:

Starting the Services

Before you start your site, please contact usatlas-federated-xrootd@cern.ch and request your site to be added BNL's global redirector. BNL global redirector has a control list for authorized sites. You will need to provide the host names of your gateway machines, and your contact info.

If you changed the default user in /etc/sysconfig/xrootd, do this once:

service xrootd setup
Both xrootd and cmsd daemons must be running on all participating hosts. They are controlled via standard init.d scripts, e.g.
service xrootd start
service cmsd start
You may use chkconfig to arrange them to be started at boot time.
chkconfig --levels=345 xrootd on
chkconfig --levels=345 cmsd on
Log files are located at /var/log/xrootd and are automatically rotated.

Interactive Use of the Xrootd Federation / Testing Your Installation

Make sure you have dq2-client 1.0 or higher, which supports global namespace. Make sure you have a valid grid proxy with ATLAS VOMS attributes. To have dq2-client prints out a dataset contains using the global name:
$ voms-proxy-init -voms atlas:/atlas
$ export STORAGEPREFIX=root://glrd.usatlas.org/
$ dq2-list-files -p mc11_7TeV.125265.PythiaWH110_ZZ4lep.merge.AOD.e825_s1310_s1300_r2920_r2900_tid582264_00
here $STORAGEPREFIX points to the global redirector at BNL root://glrd.usatlas.org/. You can replace it with a regional redirector or your own local Xrootd host. You can the use xrdcp to copy a file
$ xrdcp root://glrd.usatlas.org//atlas/dq2/mc11_7TeV/AOD/e825_s1310_s1300_r2920_r2900/mc11_7TeV.125265.PythiaWH110_ZZ4lep.merge.AOD.e825_s1310_s1300_r2920_r2900_tid582264_00/AOD.582264._000004.pool.root.1 /tmp/junk
or access it from ROOT:
$ root -b -l
root [0] TFile::Open("root://glrd.usatlas.org//atlas/dq2/mc11_7TeV/AOD/e825_s1310_s1300_r2920_r2900/mc11_7TeV.125265.PythiaWH110_ZZ4lep.merge.AOD.e825_s1310_s1300_r2920_r2900_tid582264_00/AOD.582264._000004.pool.root.1");
If you know the above file's GUID (you can get this from dq2-ls -f), you can also access it using root://glrd.usatlas.org//atlas/dq2/xyz!GUID=E64F9823-DA0F-E111-AA5D-00219B8BC633. In xrdcp command line, you may need to put a "escape" before "!", like this "\!".

Using GUID to access the Xrootd federation is more efficient and is deterministic, compare to the heuristic method used if a global name is provided. However, since this is a function of xrd-lfc, GUID based access will not be able to find files stored in Tier 3s that don't use LFC (e.g. Tier 3 sites that use global name space as actual storage path).


Charles G. Waldman, original author of the AtlasXrootSystems? TWiki at CERN

-- WeiYang - 07 May 2012

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback