r1 - 15 Apr 2008 - 15:55:11 - RobertGardnerYou are here: TWiki >  Admins Web > AtlasReleasesP4

AtlasReleasesP4

Introduction

See progress on Panda-based release installations in previous phases.

Progress

  • Resolve role/permissions issues at sites: 1/15/08
    • all sites agree to change the permission on DQ2 area to allow usatlas2 account write into it, by making the directory group writable
    • the issue with dCache is that it doesn't respect umask and sticky bit, so new subdirs created by usatlas1 will give permission errors to usatlas2.
    • The suggestion is to change the BNLdCacheSiteMover? function in pilot code, so that whenever it creates a new subdir for storing log file, makes it group writable.
    • status as of 01/23/2008
      • AGLT2, SLAC, BNL, OU, UC, UTA-DPCC, BU are checked ok.
  • Deploy production submit host for Panda release pilots: 2/1/08
    • The condor-g submit host is setup on localsub01.usatlas.bnl.gov, using pilot2.
  • Validation on all sites: 2/15/08
    • Validation using release 12.5.0 passed successfully on all T1/T2 sites

Actions

  • Panda group (Torre) takes over the responsibility of setting up an autopilot submit host for install pilots
  • Ready for production operations
  • Switch to use pacball and DQ2 as the basic deployment method

Documentations

  • How to submit installation jobs to panda (Tadashi)
    • install a trf on sites from CERN pacman cache
      • python installSW.py --site BNL_ATLAS_1 -m am-CERN -s atlas_app/atlas_rel/13.0.30 -c AtlasProduction? _13_0_30_1_i686_slc3_gcc323_opt
    • install a release on many sites from BNL cache
      • python installSW.py --site SLACXRD, BU_ATLAS_Tier2, IU_OSG, BNL_ATLAS_1, OUHEP_OSG, OU_OCHEP_SWT2, OU_OSCER_ATLAS, UTA-DPCC, UTA_SWT2, UC_ATLAS_MWT2, UC_Teraport, MWT2_UC, MWT2_IU -m http://www.usatlas.bnl.gov/BNL_ATLAS_Pacman -s atlas_app/atlas_rel/12.5.0 -p 12.5.0slc3+gcc --tag ATLAS_LOC_1250 --version 12.5.0
    • install a release and trf, also check the OSG info file
      • installAthasSW -m am-CERN -s atlas_app/atlas_rel/12.0.6 -p 12.0.6slc3+gcc -c AtlasProduction? _12_0_6_4_i686_slc3_gcc323_opt --tag ATLAS_LOC_1206 --version 12.0.6

Monitor (Torre)

Known issues, problems

  • For a cluster having different OS worker nodes, installation jobs sometimes fail because pacman finds the existing release is not consistent with the OS of the worker node where the new install is attempted. For now, suggest to use pretend-platform option to always use SLC3 version. Later on, maybe multiple subdirs is needed inside $OSG_APP to hold different OS versions.
    • It's solved by adding the -pretend-platform option to pacman command in the trf
  • A request is in to be able to remove release and trf if needed
    • Tadashi will add this
  • To keep release consistent across all sites, pacball can be used in the deployment. The question is : is it possible to "patch" the release with new trf packages if it's installed as pacball? If not, how should we proceed to make sure the consistency across all sites?
  • Since now pilot submission is moving to autopilot/condor-g, should the submit host localsub01.usatlas.bnl.gov also be using autopilot? -- question to Torre

-- RobertGardner - 15 Apr 2008

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments

 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback