r41 - 02 Jul 2012 - 14:55:18 - WenshengDengYou are here: TWiki >  AtlasSoftware Web > GlobusOnline

Integration of PanDA with Globus Online


This page is intended for documentation of various technical aspects of Globus Online operation within the PanDA framework. The actual use cases for the integrated system are documented in a separate document.

Globus Online is a Web service and a set of Grid tools that provides management, automation and monitoring tools for data transfers in the grid environment. In the context of PanDA, it has the potential to facilitate work of smaller groups of researchers (PanDA users) who would like to manage transfer of data (including to and from their personal desktops and laptops) without reliance on ATLAS DDM.

Globus Online provides:

  • gridFTP-based infrastructure with built-in features for robust transfers
  • X.509 security layer that makes use of certificates and user proxy in the client for authentication purposes
  • both web-based and client-side tools to actuate data transfers between any two of grdFTP endpoints registered and activated by the user
  • web-based monitoring tools
  • Globus Connect, a software component that can be used to create a virtual endpoint on any machine.


The proposal is to integrate Globus Online (further referred to as GO) with ATLAS PanDA on two levels:
  • pilot code (because the data must be handled outside of job itself)
  • job submission tools (prun, pathena), since the user must control the source and destination of the data at job submission time
To use Globus Online as a prime data transfer service, we must have gridFTP servers available to create the Globus endpoints. Administrator will create a set of public or private endpoints representing gridFTP servers. Users are be able to display and use the public endpoints, if they have an account and are using their proper X.509 proxy. Globus Online supports Grid Security Infrastructure (GSI), user authentication is done through validation of user proxy credentials, allowing servers to validate the access to their resources. Since many gridFTP servers used in OSG are already configured to run as ATLAS resources, these can be uses as Globus endpoints and there is no need for extensive setup.

Globus Online uses the notion of "tasks" to describe instances of data transfer. Therefor, each transfer necessitates the creation of the "transfer task" in the GO system. To do it automatically, as part of the user's job execution, ATLAS or more general infrastructure needs to handle that on behalf of the user. Since users have their X.509 certificate installed in their Globus Online user account, the Pilot just needs a valid proxy of the user submitting the job. For this, we can use of a Myproxy server. Users deposit valid credentials unto the server, and allow the Pilot to retrieve it, and auth/auth to the GO server as the user.

Globus Connect is a tool that allows the user to create a virtual endpoint on any compatible machine, without the need to install a full-scale gridFTP server. As an example, consider a WN where the Pilot runs its payload. The WN will have no gridFTP server available, so we need to create a virtual GO endpoint on this machine. This is achieved by using Globus Connect. The pilot just needs to download the Globus Connect application from a Globus Online server hosting the code, then run it to create the Endpoint for the WN and thus set up Globus Connect.

Use case: dataPilot + Globus Online

Interaction of the pilot and Globus Online

For the needs of non-ATLAS VOs, we have created the "dataPilot", a modified version of the "trivialPilot", equipped with GO capabilities. There are also changes in the "sendJob" script used by non-ATLAS VOs and PanDA testers. This is the sequence of events happening during its run:

  1. Pilot is running with the usual pilot (production) identity.
  2. User submits a job complete with new parameters for data transfer using sendJob
  3. Pilot receives job from dispatcher
  4. Pilot retrieves the user's proxy credential from a MyProxy server (we are currently using the CERN instance)
  5. Pilot downloads Globus Connect
  6. Plilot creates new Endpoint on working site/node
  7. It generates setup code for Globus Connect
  8. Run Globus Connect setting with PIN number to asociate Globus Connect with created Endpoint (-setup mode)
  9. Run Globus Connect in background (-start mode)
  10. Pilot creates a GO transfer task for stage-in
  11. Pilot creates a GO transfer task for stage-out
  12. Pilot creates a GO transfer task for output
  13. Pilot sends transfer task to GO for stage-in
  14. Pilot runs the job
  15. Pilot sends transfer task to GO for stage-out
  16. Pilot sends transfer task to GO for output
  17. Once transfer are completed, pilot cleans up
  18. Stops Globus Connect (-stop mode)
  19. Removes Endpoint for work node

Parameters of Data Transfer

As mentioned above, Globus Online handles data movement only between previously created Endpoints, mapped to gridFTP servers or running instances of Globus Connect. There is currently one limitation: it is not possible to transfer data between two Globus Connect Endpoint directly. So, using Globus Connect, the other endpoint must represent a valid gridFTP server.

In general, data movement with Globus Online involves specification of the following parameters:

  • Stage-in: The user specifies the files that will be required to run the job. The user can select different sources, specifying Endpoint and complete path. The Pilot generates the destination endpoint and path.
  • Stage-out: The user specifies the files, that will be transferred once the job is completed. The user can select different files as source. He needs to specify the endpoint and path where the files should be transferred.
  • Job output: The user specifies the files to be created by the job and where they should be transferred.
Using the Globus Online web interface, the user can monitor existing and previous transfer tasks.

Example of Running jobs at BNL Tier3 and staging output to Globus Online endpoint

See an example job. When submitting the job with prun, in addition to what parameters I usually specify, I also turned on these two:
  • --site=ANALY_BNL_T3
  • --useGOForOutput wdeng#ANLHEP_ATLAS8:/atlas/dq2/test/
Before submitting the jobs, I stored a proxy in the CERN myproxy server by running:
myproxy-init -s myproxy.cern.ch -l wdeng -x -Z "/DC=org/DC=doegrids/OU=People/CN=Nurcan Ozturk 18551" 
"wdeng" is my voms nickname which is also the second field of my private dataset name.

Example of Implementation

Extra parameters for the pilot

To integrate Globus Online service into out Pilot, we create transfer task routines with the parameters the user specifies at the job submission. Taking the trivialPilot as the basis, added are the following variables internal to the pilot, which are populated from the set of parameters obtained from the job (which in turn are defined by the user at job submission time):
  • StageinGOsrc
  • StageinGOdest
  • StageoutGOsrc
  • StageoutGOdest
  • OutputGO
  • WorkingendpointGO
  • Data

OutputGo, WorkingendpointGO and Data are the main parameters to create the data movement tasks. The Workingendpoint is the logical endpoint where the job will be running. The user can select an existing endpoint server or create an endpoint at any machine using Globus Connect. Data is the data that the job will generate after the execution, know by the user. And the OutputGO is an existing endpoint (gridFTP server) and the complete path to the directory where the user wants his data after the job finishes.

To run a transfer using Globus Online CLI, the following command is provided:  

scp -v endpointsource:/path_to_file endpointdestination:/path_to_file_or_directory

SCP is the standard command to create transfers task inside Globus Online interface. It is not the same as scp command used by ssh.

In order to create a remote transfer task, we use gsissh command to connect to Globus Online:

gsissh cli.globusonline.org scp -v endpointsource:/path_to_file endpointdestination:/path_to_file_or_directory

Using the SCP command to create transfers between gridFTP Endpoints at Globus Online allows one to auto-activate the Endpoint, when using the -g option. This option uses the X.509 certificate and proxy to activate the site for 12 hours.  But since we are creating transfers between gridFTP server Endpoints and Globus Connect endpoints, we cannot use the -g option. Endpoints created with Globus Connect activate them self at the moment they are created, and re-activate automatically after 24 hours. This means at each transfer task, the Pilot must only activate the endpoint corresponding to a gridFTP server. For this, we make the following assumption:

  • For Stage-in: Data is transferred from gridFTP servers to Globus Connect hos. Pilot activates Source Endpoints
  • For Stage-out:  Data is transferred from Globus Connect host to gridFTP server. Pilot activates Destination Endpoints
  • For Output: Data is transferred from Globus Connect host to gridFTP serve. Pilot activates Destination Endpoints
Inside the dataPilot the execution of that command looks like the following
def send_data(destination, data, jobdirectory ,endpoint):
     dest = destination.split(':')[0]
     arg = ['gsissh', 'cli.globusonline.org', 'endpoint-activate','-g', dest]

     cmd = 'gsissh cli.globusonline.org scp -v %s:%s%s %s'%(endpoint, jobdirectory, data, destination)
     return cmd

where endpoint, data and destination are the variables previously mentioned, and jobdirectory is obtained by the Pilot. 

Similarly, for the Stage-in and Stage-out:

def send_data_stage-in(source, destination):

     src = source.split(':')[0]
     arg = ['gsissh', 'cli.globusonline.org', 'endpoint-activate','-g', source]

     cmd = 'gsissh cli.globusonline.org scp -v %s %s'%(source, destination)
     return cmd

For the implementation we created a function that downloads the Globus Connect software.

def downloadGC():    
    url  = 'http://confluence.globus.org/download/attachments/14516429/globusconnect'
    arg = ['curl','--connect-timeout','20','--max-time','120','-s','-S',url,'-o','globusconnect']

Now we create the specified Endpoint, corresponding to the working node where the data should be available for the job. This function will return the PIN number that will be required to set up Globus Connect.

def createEndpoint(endpoint):
    arg = ['gsissh', 'cli.globusonline.org', 'endpoint-add','--gc', endpoint]
    proc = subprocess.Popen(arg, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
    return_code = proc.wait()
    i = 0
    for line in proc.stdout:
        print line.rstrip()
        i += 1
        if i == 3:
           pin = line
    return pin

This one will run the set up for Globus Connect, using the PIN number for the Endpoint

def setupGC(endpoint):
    pin = createEndpoint(endpoint)
    pin = pin.strip()
    arg = ['sh', 'globusconnect', '-setup', pin]

To clean up, we remove the Endpoint at the end

def removeEndpoint(endpoint):
    arg = ['gsissh', 'cli.globusonline.org', 'endpoint-remove', endpoint]

And also delete Globus Connect file and settings

def removeGC():
    arg = ['rm','-rf','~/.globusonline/','globusconnect']

Function to run Globus Connect

def startGC():
    arg = ['sh', 'globusconnect', '-start']

Function that stops Globus Connect:

def stopGC():
    arg = ['sh', 'globusconnect', '-stop']

The Pilot also need users identity to create the transfer tasks, so we retrieve the users proxy certificate from Cerns Myproxy Server, using correct format:

def getproxy(userID):
    dn = userID
    if dn.count('/CN=') > 1:
       first_index = dn.find('/CN=')
       second_index = dn.find('/CN=', first_index+1)
       dn = dn[0:second_index]
    arg = ['myproxy-logon','-s','myproxy.cern.ch','--no_passphrase','-l',dn]

Changes in the job submission tool

The job submission tool used by non-ATLAS VOs -- "sendJobs" -- has been modified with new parameters to create the transfer task. The new parameters are:

  • StageinGOsrc : files to be transferred previous job run. Can be None, 1 or more sources.
    Format: --StageinGOsrc=Endpoint:/path_to_file
  • StageinGOdest : files or directory destination of previous input. If any source selected previously, must select one destination directory or as many destinations as sources.
    Format: --StageinGOdest=Endpoint:/path_to_directory_or_file
  • StageoutGOsrc : files to be transferred after the job run. Can be  None, 1 or many sources. Format: --StageoutGOsrc=Endpoint:/path_to_file_or_directory
  • StageoutGOdest : files or directory destination of previous input. If any source selected previously, must select one destination directory or as many destinations as sources.
    Format: --StageoutGOdest=Endpoint:/path_to_directory_or_file
  • OutputGO : destination directory of the data generated by the job.
    Format: --OutputGO=Endpoint:/path_to_directory
  • WorkingendpointGO : Endpoint where the job will be running.
    Format: --WorkingendpointGO=Endpoint
  • Data : Data generated by the job.
    Format: --Data=name_of_data

Example of usage

To activate Globus Online capability for a job, the Pilot analyzes the job parameters. If the job has the parameter WorkingendpointGO? , it will automatically set Globus Connect for the worknode. For the latest version of trivialPilot, it's not necessary to specify the destination of the transfer during Stage-in.

For testing, we submit a job as the following:

./sendJob.py --njobs 1 --computingSite TEST1 --transformation http://www.usatlas.bnl.gov/~ccontrer/datatransfer.sh\
 --prodSourceLabel user --cloud OSG --StageinGOsrc=SITE1:/home/osg_data/panda/0064.file --OutputGO=SITE1:/home/osg_data/panda/\
 --WorkingendpointGO=OSGDEV --Data=10mb-file.bin

This will run the datatransfer.sh transformation file, which creates a dummy 10Mb file.
The Pilot will create the OSGDEV Endpoint using Globus Connect. Then, it will create the transfer task corresponding to:
SITE1:/home/osg_data/panda/0064.file to OSGDEV:/working-directory/
The pilot creates the destination path of the Endpoint automatically using the current information the pilot handles.
After the Pilot runs the transformation file, it creates the transfer task:
OSGDEV:/working-directory/10mb-file.bin to SITE1:/home/osg_data/panda/


Appendix I: Short Guide to Globus Online

Account configuration

  1. Create user account at www.globusonline.org
  2. Once created, log into your account.
  3. At My Profile select Manage Identities and add your X.509 Certificate.


Connect to Globus Online using gsissh command line interface

  • Globus Online account properly configured
  • X.509 Certificate installed on Unix user account
  • have a valid proxy activated (voms-proxy-info or grid-proxy-info)

Once all requirement are completed, issue the following command:

gsissh cli.globusonline.org
This will give you access to your Globus Online account using your X.509 Certificate as user authentication.

Basic operations and Endpoint management
Action Command Options
List available Endpoints endpoint-list
Show information about Endpoint endpoint-list -v Endpoint
Activate an Endpoint endpoint-activate -g Endpoint
List content of an Endpoint ls Endpoint:/
Transfer between two Endpoints scp -g -v Endpoint1:/path_to_file Endpoint2:/path_to_file -g auto-activate site with users X.509 Certificate
-v verbose mode
-r transfer directory recursively, when using directory instead of individual file transfer
Get status of transfer task status -a -l LIMIT limit the number of results
Create an end-point called EndpointName endpoint-add -p gridFTPServer EndpointName
This will make EndpointName public endpoint-modify --public EndpointName
Remove Endpoint endpoint-remove EndpointName

Connect to Globus Online using HTTP (REST)

Examples can be found here.
$ curl --cert ~/.globus/usercert.pem --key ~/.globus/userkey.pem --capath ~/.globus/certificates\ 
                  --header "X-Transfer-API-X509-User: USERNAME" \    # or --cookie "x509-user=USERNAME"\ 

Simple test of the Python Client Library that uses REST

~/python2.6.7/bin/python2.6 -i -m globusonline.transfer.api_client.main mxp\
 -C /direct/usatlas+u/mxp/globus-test/gd-bundle_ca.cert -c .globus/usercert.pem -k .globus/userkey.pem

>>>>> status_code, status_message, data = api.task_list()

API Methods and Attributes

  • 'subtask', 'subtask_event_list', 'subtask_list', 'task', 'task_cancel', 'task_event_list', 'task_list', 'task_update', 'tasksummary'


  • 'endpoint', 'endpoint_activate', 'endpoint_activation_requirements', 'endpoint_autoactivate', 'endpoint_create', 'endpoint_deactivate', 'endpoint_delete', 'endpoint_list', 'endpoint_ls', 'endpoint_mkdir', 'endpoint_rename', 'endpoint_update'

Appendix II: Status and Work Plan for January-February 2012

Team Participants

Wensheng Deng, Maxim Potekhin, Shuwei Ye

Project Status in January 2012

Carlos Contreras has done the following:
  • contributed to this document
  • investigated the functionality of MyProxy server at CERN, for purposes of storing user proxies
  • created a version of "trivial pilot" that is intended for non-Atlas users, and augmented with Globus API
  • created a version of the Atlas Pilot that is Globus-enabled
Testing done by Carlos was not complete. Globus Online functionality in the pilots was achieved by scripting the gsissh interface to the web service.

In order to separate development from production, Maxim has created additional Pilot Wrappers (and corresponding Pilot Types) which download the pilot code not from the cached CERN repository, but from AFS area on the Atlas web server at BNL, which allows to do development "live" and test everything before committing the code to SVN. The corresponding "pilot types" can be found in the Autopilot section of the Panda monitor as "data" and "atlasTest" types. The wrapper is easy to modify to direct it to the location of the pilot code in any user's afs directory (e.g. Maxim's or Wensheng's)

Objective and Reporting

To deliver a simple but functioning version of the Atlas Pilot for Tier-3 evaluation by end of February. The use case will likely include data processing like Ntuple reduction and merging, where the data source can be either on a Grid SE, or on the user machine (via Globus end-point created), and destination likewise can be either on the Grid or on the client machine. Processing will happen at BNL and/or OU. We'll shoot for Grid-to-Grid data path first, which, time permitting, will be augmented with other options such as Grid-to-User version.

Progress of the project will be reported in Friday meetings of the PAS group at BNL.

Division of work

The current plan calls for the following:
  • Maxim (Weeks 1-3) will correct the logic in the "data pilot" and make sure that MyProxy does work as advertised and can be used to store proxies suitable for Globus Online interaction, following up on concerns from Wensheng. Specifically, Maxim needs to make sure that the original proxy is not overwritten during the extraction of the payload proxy from MyProxy, this is a known defect of Carlos' code. In addition, following suggestion and a code sample from Wensheng, Maxim will explore and implement a solution based on Python binding to Globus, as opposed to gsissh script, which may have advantages.
  • Maxim (Week 4) After that, work in parallel with Wensheng (see item below) debugging the internals of the Atlas Pilot logic
  • Shuwei (Weeks 1-2) will develop the actual use case - the code, and the input data. These parameters will be communicated to Wensheng.
  • Wensheng (Weeks 1-2) will phase in the modified Atlas Pilot, temporarily bypassing the Globus Online logic, i.e. aiming to do a dry run first with the payload formulated by Shuwei, files being local to BNL filesystem at first
  • Wensheng (Weeks 3) after completion of the first phase, activate elements of the Globus Site Mover, authentication first, then actual gsissh commands execution by Globus Online
  • Everybody (Weeks 3-4): running test jobs, debugging and commissioning of the modified pilot code.

Appendix III: Adapting BNL Tier-3 Panda queue for Globus Online

Update of the queue configuration (SchedConfig) for ANALY_BNL_T3-condor

copyprefix = /data/panda
copytool = mv
copytoolin = None
ddm = local
environ = VO_ATLAS_SW_DIR=/usatlas/OSG/atlas_app/atlas_rel/
envsetup = source /afs/usatlas.bnl.gov/i386_redhat72/opt/lsm/setup.sh
envsetupin = None
se = /data/panda
changed to
copyprefix = ^srm://dcsrm.usatlas.bnl.gov
copytool = lsm
copytoolin = lsm
environ = APP=/usatlas/OSG TMP=/tmp DATA=/usatlas/prodjob/share/
envsetup = source /afs/usatlas.bnl.gov/i386_redhat72/opt/lsm/setup.sh
envsetupin = source /afs/usatlas.bnl.gov/i386_redhat72/opt/lsm/setup.sh
se = token:ATLASUSERDISK:srm://dcsrm.usatlas.bnl.gov:8443/srm/managerv2?SFN=
With the change above, this histogram merging job finished successfully. Now applied the following modification
copytool = globusonline
copytoolin = globusonline
to test the SiteMover? switch in Atlas Pilot.

Testing Globus Online transfer API on the BNL T3 queue

Test ran on grid ui machine:
globus-job-run gridgk05.racf.bnl.gov/jobmanager-condor -q bnl-localtest -s /tmp/play_go.sh
The script play_go.sh contains
source /afs/usatlas/osg/wn-client/@sys/current/setup.sh
/bin/hostname -f
myproxy-logon -s myproxy.cern.ch --voms atlas -o foo -l wdeng -n
curl -o gd-bundle_ca.cert  https://transfer.api.globusonline.org/gd-bundle_ca.cert 2>/dev/null
curl --cert ./foo  --key ./foo  --capath ./gd-bundle_ca.cert \
--header "X-Transfer-API-X509-User: wdeng" 'https://transfer.api.globusonline.org/v0.10/tasksummary' 2>/dev/null
ls -l foo
ls -l gd-bundle_ca.cert
rm gd-bundle_ca.cert
rm foo

Provisional version of Globus software: activation of endpoints using proxy delegation

Digitally signing the client certificate, with validity period of 12 hours:
cat server.pubkey /tmp/x509up_u10030 | ./mkproxy 12 > /tmp/proxy_chain
Endpoint activation:
./delegate_proxy_activate.py mxp 'mxp#MXP_BNL_TEST' \
/tmp/proxy_chain -k /tmp/x509up_u10030 -c /tmp/x509up_u10030 \
-C ~/globus-test/gd-bundle_ca.cert

Appendix IV: Testing basic MyProxy functionality:

Depositing the X509 proxy while setting the trusted retriever identity to "Nurcan Ozturk". Retrieving X509 proxy later from a different machine.
  • ran on one box as user mxp:
    myproxy-init -s myproxy.cern.ch -x -Z "/DC=org/DC=doegrids/OU=People/CN=Nurcan Ozturk 18551"
  • ran on another:
    myproxy-logon -s myproxy.cern.ch -l mxp -n -o foo
The credential is now in file "foo". Note that this will also result in re-assignment of the variable X509_USER_PROXY.

When generating the credential during the myproxy-init call, it is critically important to ensure that the proxy being generated is RFC-compliant, because otherwise the Globus Online API client will suffer a severe crash with a potentially stalled process. This is achieved by setting the GT_PROXY_MODE environment variable to "rfc" before the call.

Appendix V: Code sample for data transfer

Initial simple test with hardcoded paths


from globusonline.transfer import api_client
from globusonline.transfer.api_client import Transfer

api = api_client.TransferAPIClient(username="mxp", \
server_ca_file="/direct/usatlas+u/mxp/globus-test/gd-bundle_ca.cert", \
cert_file="/tmp/x509up_u10030", key_file="/tmp/x509up_u10030")

status_code, status_message, data = api.submission_id()
print status_code, status_message, data

sid =  data['value'] # get submission id, which is a handle

ep1 = 'mxp#MXP_BNL_TEST'
ep2 = 'mxp#MXP_TEST'
t = Transfer(sid, ep1, ep2)
path1 = '/usatlas/u/mxp/foo'
path2 = '/home/usatlas1/foo'
t.add_item(path1, path2)

Full Globus interface for the "globusPilot"

./sendJob.py --njobs 1 --computingSite TEST3 \
         --transformation http://www.usatlas.bnl.gov/~mxp/panda/transformations/maxim_test.sh \
         --prodSourceLabel user --cloud OSG \
         --jobParameters 'globus-user=mxp \
                                  globus-endpoint-in=mxp#MXP_BNL_TEST globus-endpoint-local=mxp#MXP_BNL_TEST globus-endpoint-out=mxp#MXP_OU_TEST \
                                  in-mode=server dir-in=/direct/usatlas+u/mxp/ files-in=xs.sh \
                                  out-mode=server dir-out=/home/usatlas1 files-out=xs.sh

Here the ultimate destination of the data is at University of Oklahoma, and the actual address of the FTP server, osgitb1.nhn.ou.edu is mapped to a mnemonic Globus Online name "mxp#MXP_OU_TEST".

Globus-related Job Parameters

In the following we assume that in general there may be up to 3 Globus Online Endpoints involved - one where the input data resides, the other one which is local to the job, and finally one for the destination, where the output shall be written.
The name of the user as registered with Globus Online

Globus Endpoint from which the data needs to be staged in
Likewise, to which Endpoint to write the output data
Local endpoint that would be responsible for receiving the data for stage-in and stage-out
The path to the input data, on the "in" Endpoint
Same for output
A comma-separated list of input files, like f1,f2,f3
Same for output files
The mode of accessing the input data, in many situation will be simply "server". When set to "local", would simply perform the copy operation on the local filesystem
Same for output data

--- *Major updates*

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback