r8 - 17 Jan 2013 - 16:57:06 - JohnHoverYou are here: TWiki >  Admins Web > CloudVirtualization

Virtual Condor Cluster and Panda Site Recipe

Static_Virtual_Cluster.png

Static Condor Central Manager

Establish a simple Condor central manager (Schedd, Negotiator, Collector) to supply jobs to the VM worker nodes. See attached configuration, which can be dropped in /etc/condor/config.d/. The ALLOW_WRITE and ALLOW_READ variables should be customized for your site. The best choice would be a stock RHEL/SL 5 host, as that will be the easiest to set up with AutoPyFactory (see below).

Boxgrinder for VM creation

  1. Acquire a Fedora 16/17 build host with standard repositories. Install rubygem-boxgrinder-build.
  2. Check out boxgrinder definitions and resources from http://svn.usatlas.bnl.gov/svn/griddev/boxgrinder
  3. Make changes to base boxgrinder install as described in boxgrinder/bg-patches/FIXES.txt.
  4. Edit site-specific parameters in atlas worker node appliance files OR create child appliance using adjusted files.
    1. ATLAS jobs, and CVMFS, require significant space to do their work. The default boxgrinder VMs produced above expect to find, format, and mount ephemeral storage. EC2 and Openstack both provide this automatically when instances are invoked. If you are running VMs directly with libvirt/virt-manager, then it makes sense to simply make the root partition large enough for the number of slots desired. 10GB should be set aside for CVMFS, and 10GB for each job slot.
    2. authorized_keys: replace with keys for site administrators
    3. setup.sh: LFC Host, Frontier server?
    4. cvmfs/default.local: replace with local CVMFS replica?
    5. condor/config.d/50cloud-condor.conf : replace CONDOR_HOST with your CM.
  5. Create a VM, e.g.:
    boxgrinder-build -f  boxgrinder/sl5-x86_64-wn-atlas-bnlcloud.appl

ATLAS jobs, and CVMFS, require significant space to do their work. The default boxgrinder VMs produced above expect to find, format, and mount ephemeral storage. EC2 and Openstack both provide this automatically when instances are invoked. If you are running VMs directly with libvirt/virt-manager, then

Set up a Panda queue for this virtual cluster.

  • You can use BNL_CLOUD as a template.
  • It is assumed that the site SRM endpoint is usable.

AutoPyFactory (APF) setup for local submission

On the same host as the Condor Central Manager, install and set up APF to submit pilots to the local virtual cluster.

  1. An administrator will need rights to /atlas/usatlas/Role=production in VOMS
  2. The same administrator will need to be added to the list of approved pilot retrievers in Panda. Email Alden. (This may no longer be necessary--go ahead and submit pilots and see if they can retrieve jobs.)
  3. Install APF using instructions at http://svnweb.cern.ch/guest/panda/panda-autopyfactory/current/INSTALL-ROOT
  4. Configure APF to submit pilots to the local Condor schedd. Here are examples for easy customization:
    • factory.conf: APF factory.conf config file for Virtual Cluster
    • proxy.conf: APF proxy.conf config file for Virtual Cluster
    • queues.conf: APF queues.conf config file for Virtual Cluster

Other Steps and Notes

  • There may be other steps necessary, but those should be similar to commissioning any new Panda queue, e.g. enabling Hammercloud jobs.
  • There is currently no provision for node life cycle management. At BNL we have scripts to simply log into nodes and execute
    condor_off -peaceful
    which shuts down the startd without killing the currently running job.
  • I plan to add in functionality such that a VM will automatically note how many dedicated CPUs are available to it, and adjust the number of slotX accounts and NUM_CPU and SLOTX_USER variables for Condor.

-- JohnHover - 21 Aug 2012

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments


config 50cloudcondordm.config (1.4K) | JohnHover, 17 Jan 2013 - 16:13 |
png Static_Virtual_Cluster.png (47.8K) | JohnHover, 22 Aug 2012 - 15:03 | Static Virtual cluster
png BNL_Extensible_VM_Strategy.png (66.8K) | JohnHover, 22 Aug 2012 - 14:57 | Virtual Cluster diagram
conf queues.conf (1.3K) | JohnHover, 22 Aug 2012 - 14:18 | APF queues.conf config file for Virtual Cluster
conf proxy.conf (0.4K) | JohnHover, 22 Aug 2012 - 14:18 | APF proxy.conf config file for Virtual Cluster
conf factory.conf (0.5K) | JohnHover, 22 Aug 2012 - 14:18 | APF factory.conf config file for Virtual Cluster
config 10cloudcm.config (1.4K) | JohnHover, 17 Jan 2013 - 16:53 | Simple Condor Central Manager configuration
config 50cloudcentralmanager.config (1.4K) | JohnHover, 17 Jan 2013 - 16:57 |
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback