r9 - 31 Oct 2012 - 04:48:05 - PatrickMcGuiganYou are here: TWiki >  Admins Web > ConsolidateLFC

ConsolidateLFC

Schedule

  • UTA-SWT2: Week of Aug 5th.
    • Done: Aug 21.  
  • SLAC: Week of Aug 19th
  • AGLT2: Week of Aug 26th
  • NET2: Week of Sep 2nd
  • UTA-CPB: Week of Sep 10th
  • OU: Week of 17th.
  • MWT2: Week of 24th.

Utilities and methods

  • Site-specific LFC snapshots for local consistency checking
  • Proddisk-cleanse
  • CCC - complete consistency checker

Codes/scripts repositories

LFC Dump

  • LFC dump can be requested by a site admin at http://www.usatlas.bnl.gov/dq2/lfcdumprequests/index
  • The program looks up requests and creates a dump.  The code can be found at BNL USATLAS git repo   https://git.racf.bnl.gov/usatlas/git/lfc/repo
  • The dump is formatted in sqlite3 with the following information from Cns_file_replica table in LFC database: guid,lfn,csumtype,csumvalue,ctime,fsize,sfn
  • guid,lfn and ctime are indexed.  
  • Once the dump is created, it is uploaded to BNL-OSG2_SCRATCHDISK area.   Then, it is subscribed to the destination site via DDM.


Procedure

  1. Drain a site from panda/ddm
  2. Shut off a site from Panda
  3. Set a site to be off in DDM
  4. Set T2 LFC to read-only
  5. Create T2 LFC dump
  6. Import T2 LFC dump to BNL mysql
  7. Insert to consolidated LFC
  8. Change ToA? and AGIS
  9. Change site configuration
  10. Turn on the site in panda/ddm
  11. run test jobs

Cleaning up PandaMover input data.

In the US production activities involve the use of PandaMover. PandaMover delivers data to the PRODDISK token area of a given site and registers the replicas in the site's LFC. The data delivery is outside of the DQ2 system and it is necessary to periodically delete these files to free space.

The previous method for deleting these files used the pandamover-cleanup.py[http://repo.mwt2.org/viewvc/admin-scripts/lfc/pandamover-cleanup.py] which operated on the MySQL backend of a site's LFC. The script determines a cut-off date and tries to delete data older than that date based on the standard LFC path: /grid/atlas/panda/dis/YY/MM/DD/DSN/FILE. The script would delete the replica information from the database, delete the physical file, and then clean up LFC path information by deleting empty parent directories. Of note is that the script operates one replica and one physical file at a time and could operate that way since the MySQL access was "local".

A replacement script has been created, clean-pm.py (available in BNL's git repository: https://git.racf.bnl.gov/usatlas/git/lfc/repo) that will delete LFC registrations and physical files based on a dump file.  The old script could work against an LFC's backend data store and operated "locally" where communication time was minimal and not necessarily secured.  The new script works against the LFC through its python API and must operate over longer distances and use GSI security mechanisms.  The new script aggregates replicas needing to be deleted into chunks to reduce the communication overheard.  In the initial release the chunk size is 1,000 replicas.  The physical copy of a file is removed from storage only after a confirmation that the associated replica was removed from the catalog.

To run the program, you must have the correct environment in place containing:

  1. A VOMS proxy with the necessary attributes to delete replicas (e.g. atlas:/atlas/Role=production)
  2. DQ2 python libraries in python's default search path.  One place to find these are by setting up: $APP/atlas_app/atlaswn
  3. LFC python libraries in python's default search path.  One place to find these are by setting up the OSG worker node client software.
The program, like its predecessor, assumes that it can delete physical replicas from its environment via python's os.unlink() function, i.e a POSIX mounted filesystem.  A recent addition to the program verifies that the base proddisk directory is visible before anything is deleted.  If you had previously altered the behavior of pandamover-cleanup.py by customizing the function delete_by_sfn(), you will need to make the same modifications to your copy of clean-pm.py in the deletePhysicalFile() function.

The script requires three input parameters: the dump file, the remote LFC, and the DQ2 site to be cleaned.  Optional parameters control cutoff date (default is 14 days), the verbosity of the logging, the location of the log file (default clean-pm.log) , and a noop parameter makes the script indicate what it would do without deleting any replicas or files.  Staring with verson 1.1.0, the program now accepts the --panda option that will try to determine which files Panda is about to use for a given site(s) and exclude these files from being deleted.  At the moment the exclusions are based only on production sites and does not account for analysis sites.  I am investigating adding analysis sites to the exclusion mechanism.  This feature is based on the work and request of Sarah.  Thanks.
 
The help output and a sample invocation are presented below:
[mcguigan@gk03 v1.0.1]$ ./clean-pm.py --help
usage: clean-pm.py [-h/--help | options] --db file --site SITE --lfc HOST

options:
  -h, --help            show this help message and exit
  --db=DB               file name of LFC dump in sqlite3 format
  --site=SITE           DQ2 Production site name
  --lfc=LFC             LFC host holding the replicas
  -d DAYS, --days=DAYS  Number of days; replicas older than this limit are
                        deleted. Defaults to 14
  --panda=PANDA         Comma delimited list of panda sites whose input files
                        are excluded
  -v, --verbose         Set verbose operation
  -n, --noop            Don't perform any deletions; just list what would
                        happen
  --log=LOG             Logging file


[mcguigan@gk03]$ ./clean-pm.py --db=user.HironoriIto.T2Dump.UTA_SWT2.20121029_04_27_35.db --site=UTA_SWT2_PRODDISK --lfc=ust2lfc.usatlas.bnl.gov --panda=UTA_SWT2 -d 14 -v

Complete Consistency Checking

This discussion focuses on using the existing ccc_generic.py for checking the consistency between storage/LFC/DQ2.  A similar script (ccc_pnfs.py) is available for sites using dCache that contains additional consistency checks for the dCache internal layout of files.  Both scripts are available from the MWT2 repository (http://repo.mwt2.org/viewvc/admin-scripts/lfc/).  In the discussion below the ccc_generic.py will be referred to as CCC, or the script.

CCC checks the contents of the LFC against storage and/or DQ2.  It was previously possible to configure CCC to interact directly against the MySQL backend of a Tier 2's LFC backend, but the migration to a consolidated LFC prevents using this configuration.  Instead, one should configure CCC to operate against a suitable lfc_file.   CCC refers to the lfc_file as a database dump but I will reserve "dump" to refer to the sqlite3 dump produced by Hiro's mechanism as discussed above.  It is possible, and in fact trivial,  to create a suitable lfc_file from the sqlite3 dump that can be used as input to CCC.  An example:
$ sqlite3 -separator ' ' <T2 DUMP FILE> "select sfn,id,ctime,guid from files;" > lfc.dump
With this command we replace the Cns_file_replica.fileid used previously with id (index) of the sqlite3 dump.

Some inconsistencies between the storage and LFC contents should be expected when CCC operates against the T2 dump file, because there is a time delay between asking for a dump file to be created and the delivery of the dump to physical storage.  During this delay one can expect that physical files will be added or deleted.  In initial testing, this delay can be on the order of hours.   You can ask CCC to ignore inconsistencies related to entries (files or replicas) with timestamps newer than a configured cutoff (min_age).  You might want to set min_age to one day to avoid these incosistencies, depending on when you run CCC relative to the time the dump was created.


  
References


-- RobertGardner - 08 Aug 201

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments

 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback