r3 - 24 Jun 2009 - 13:18:32 - CharlesWaldmanYou are here: TWiki >  Admins Web > AdlerUpdate

AdlerUpdate

Introduction

This page captures information used by US ATLAS sites to implement transfer validations using Adler32 checksums. See also DQ2SiteServices.

Patch for FTS.py

Here's the code modification as applied at UC. This has a hardcoded prefix of /pnfs/uchicago.edu which will need to be changed for other sites. The method of getting size and checksum information is dcache-specific.

This patch will remove files if they are found to have invalid checksums. A future extention would be to have a callback to the upstream site to notify in case of failures.

/usr/lib/python2.3/site-packages/dq2/transfertool/fts$ diff -u FTS.py-orig FTS.py
--- FTS.py-orig	2009-06-09 12:21:48.000000000 -0500
+++ FTS.py	2009-06-13 13:37:49.000000000 -0500
@@ -513,10 +513,44 @@
     def isValidTransfer(self, dest, destsurl, fsize, checksum):
         """
         @see L{dq2.transfertool.TransferToolInterface.isValidTransfer(self, dest, destsurl, fsize, checksum)}
-        """                
-        self.__logger.info('VALIDATED %s FOR %s SIZE %s CHECKSUM %s' % (destsurl, dest, fsize, checksum))
-        
-        return True
+        """
+	prefix='/pnfs/uchicago.edu'
+	if not checksum:
+            self.__logger.info('VALIDATED %s FOR %s SIZE %s CHECKSUM %s' % (destsurl, dest, fsize, checksum))
+        elif checksum.startswith("ad:") and prefix in destsurl:
+	    pnfspath = prefix+destsurl.split(prefix)[1]
+	    chkSumDcache = None
+            try:
+                f = open("%s/.(use)(2)(%s)" % os.path.split(pnfspath))
+                for line in f.readlines():
+                    if 'c=1:' in line:
+                        chkSumDcache=line.split("c=1:")[1].split(";")[0]
+                        break
+                f.close()
+            except:
+                chkSumDcache = ''
+	    chkAd = checksum.split("ad:")[1]
+	    if chkSumDcache==chkAd:
+		self.__logger.info(
+		    'ADLER32 CHECKSUM VALIDATED %s FOR %s SIZE %s CHECKSUM %s' %
+		    (destsurl, dest, fsize, checksum))
+		return True
+	    else:
+		self.__logger.error(
+		    'FAILED TO VALIDATE ADLER32 CHECKSUM %s FOR %s SIZE %s CHECKSUM %s =! dCache CHECKSUM %s ' %
+		    (destsurl, dest, fsize, checksum, chkSumDcache))
+                try:
+                    os.unlink(pnfspath)
+                    self.__logger.info("DELETED %s" % pnfspath)
+                except:
+                    self.__logger.error("CANNOT DELETE %s" % pnfspath)
+		return False
+	else:
+	    self.__logger.info(
+		'VALIDATED %s FOR %s SIZE %s CHECKSUM %s' %
+		(destsurl, dest, fsize, checksum))
+  	    return True
+
         
         
     def cleanAttempt(self, destsurl):

Original instructions (from Hiro)

Instructions used for GPFS @ BU

Our GPFS storage is a posix file system, so the stat, md5sum, and adler32 shell commands are all we need. We had already implemented this functionality as an lsm command (local site mover, see http://www.usatlas.bnl.gov/twiki/bin/view/Admins/LocalSiteMover). Therefore, we implemented isValidTransfer() as:

def isValidTransfer(self, dest, destsurl, fsize, checksum):
	"""
	@see L{dq2.transfertool.TransferToolInterface.isValidTransfer(self, dest, destsurl, fsize, checksum)}
	"""
	
	waitstatus = os.system("source /opt/python-2.5.4/setup.sh; source /opt/lsm/setup.sh; lsm-check %s %s %s" % (desturl, fsize, checksum))
	if waitstatus!=0:
		self.__logger.info('FAILED to VALIDATE ADLER CHECKSUM %s FOR %s SIZE %s CHECKSUM %s (%s)' % (desturl, dest, fsize, checksum, waitstatus))
		return False
	else:
		self.__logger.info('VALIDATED %s FOR %s SIZE %s CHECKSUM %s' % (destsurl, dest, fsize, checksum))
		return True

lsm-check is a script that basically checks the file size and calls the appropriate checksum shell command. (It actually runs the code on the gatekeeper using ssh since the storage is not mounted on the site services host.) Very detailed error messages are logged to the lsm log file (why the check failed, whether because of the size, checksum, some infrastructure failure, etc.), and we didn't find it necessary yet to propogate these back through DQ2 to its log, too. The waitstatus is actually enough, since each error has a different exit status.


About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments

 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback