r3 - 01 May 2013 - 12:20:32 - RobertGardnerYou are here: TWiki >  Admins Web > SupportingCMS




  • Worker node OS should be RHEL 5 (or equivalent, such as SL5)
  • Worker nodes require outbound internet access (nodes can be behind NAT)
  • Worker node memory: 2GB / slot
  • Worker node scratch: assume 20G per job slot - though typically using less than 10G
  • The worker nodes require the OSG CA certs that are installed as part of the OSG worker-node client. Host certs on the worker nodes are not required.
  • Site squid (optional but strongly desired): used for access to glidein software, CVMFS, and CMS conditions: ~1 GB space in cache for optimal use
  • Job preemption is okay
  • glexec is optional but preferred

Security profile


  • For every job slot, a pilot job process starts up.
  • The pilot job spawns a condor master, which spawns a condor startd, which spawns condor starters, which spawn jobs from end-users.
  • The pilot job makes TCP connections for HTTP access to either the glidein factory at UCSD, IU or CERN.
  • The pilot job makes TCP connections for HTTP access to the glidein frontend at UCSD or CERN.
  • The pilot job makes one or more GSI-authenticated TCP connections to a port on the glidein collector. This is for Condor CCB service.
  • The pilot job may send outbound UDP to a port on the glidein collector.
  • The startd and starter send outbound TCP traffic to two ports on one of the Condor submit machines at UCSD, to communicate with the condor_schedd and condor_shadow.
  • CMS jobs running at non-CMS sites are run inside of parrot, using parrot's CVMFS support to access CMS software via HTTP.
  • Parrot makes TCP connections for HTTP access to the CVMFS repository.
  • If the site provides an HTTP proxy via OSG_SQUID_LOCATION, this is used by parrot. Otherwise, central CVMFS proxies are used.
  • CMS jobs make HTTP connections to the CMS frontier server to read conditions data.
  • The site proxy is used for CMS frontier if available and compatible (2.6 <= squid_version < 3). Otherwise, central frontier proxies are used.
  • CMS jobs make outbound TCP connections to read data via xrootd from sites across the US.
  • Jobs send output files via SRM + gsiftp to sites across the world.

Hosts and ports:

  • The frontend is gfactory-1.t2.ucsd.edu
    • ports: 80
  • The collector is glidein-collector.ucsd.edu
    • ports: 9620-9919
  • The UCSD factory is glidein-1.t2.ucsd.edu
    • port 8319
  • The GOC factory is glidein.grid.iu.edu
    • port 80
  • The Condor submit machines
    • glidein-2.ucsd.edu
    • submit-2.ucsd.edu
  • The CVMFS repository
    • cvmfs01.hep.wisc.edu:80
    • cvmfs03.hep.wisc.edu:80
  • Central CVMFS proxies
    • cache01.hep.wisc.edu:80
    • cache02.hep.wisc.edu:80
  • CMS frontier servers
    • cmsfrontier.cern.ch:8000
    • cmsfrontier1.cern.ch:8000
    • cmsfrontier2.cern.ch:8000
    • cmsfrontier3.cern.ch:8000
  • Central CMS frontier proxies
    • cmsfrontier01.hep.wisc.edu:3128
    • cmsfrontier02.hep.wisc.edu:3128

Setting up access

  • In GUMS, enable the cms VO for your gatekeeper under https://your_gums_host:8443/gums/hostToGroupMappings.jsp
  • Make sure the username uscms01 exists on the gatekeeper(s) and worker nodes, and that the homedir is mounted and ownership is correct.
  • On the gatekeeper(s), run: gums-host-cron There should be no output. Then check /var/lib/osg/supported-vo-list.txt and user-vo-map.txt to make sure they list the new VO. If they do not, check the logfile at /var/log/gums/gums-host-cron.log

To test access:

  • Ask the VO contacts above to test access
  • Or, test by mapping yourself in gums to the cms vo and running a job. On the Manual account mappings page https://your_gums_host:8443/gums/manualAccounts.jsp, set up a local mapping of yourself to cms01. Also look at Manual user groups https://your_gums_host:8443/gums/userGroups.jsp and make sure you are a part of group 'localusers'. Then, as yourself in a shell with wn-client or wlcg-client loaded, run
    globus-job-run your_gatekeeper/jobmanager-condor /usr/bin/id
    While it is still running, on your gatekeeper check that the job is queued, with the correct username and any jobmanager settings specific to your site ( ex. priority, Condor accounting groups, PBS queue )

-- RobertGardner - 11 Apr 2012

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback