r8 - 19 Nov 2013 - 13:53:32 - RobertBallYou are here: TWiki >  Admins Web > AdHocComputeServerWG

AdHoc Committee to Examine and Advise Upon Worker Node Configurations

Committee Charge

This Ad Hoc committee was formed to collect and disseminate information and knowledge about Worker Nodes. Among us we have a large body of knowledge concerning the issues we face, the hardware we purchase, and the problems we encounter (and resolve) in dealing with our Worker Nodes and related hardware. It is our task to assemble that information as a reference for each other as we pursue the best Grid Systems for ATLAS Computing.

Meeting Minutes

Minutes June 14, 2013

Minutes June 28, 2013

Minutes July 12, 2013

Minutes July 26, 2013

Minutes August 9, 2013

Minutes November 15, 2013

Considerations and Issues

Three considerations will affect all grid sites, at varying levels. The balance of these 3 to best serve a site's needs will vary from site to site.

  • Space
  • Power
    • How much can be UPS protected
  • Cooling

For example, the UM Site of AGLT2 is space constrained, while the MSU Site is constrained by their cooling. This has resulted in different choices in our purchases, with Dell M1000e blade chassis at UM, and Dell R410 1U compute nodes at MSU. BNL did not look into blades as space there is not an issue. Power is also not an issue for them.

Machine Configurations

Whatever the choice of hardware, the ATLAS standard has been 2GB of RAM minimum per logical core, and 20-30GB of disk space. This latter may be changing as direct file access becomes more common. However, it may be time to examine the RAM specification, as image sizes continue to grow. Several recent purchases have had as much as 3GB of RAM per logical core.

  • RAM size
  • Local storage capacity
  • Swap Space

Choice of Vendor

Dell has been very good to ATLAS, price-wise. Other choices have also been made. Whatever the choice, it is a good idea to not mix hardware from too many vendors. Each has its own monitoring scheme, and maintenance issues. With hardware from many vendors, these issues can quickly become burdensome.

Work with the sales people, pushing them for good pricing. Below are some sample configurations purchased recently.

Recent Reference Purchases

AGLT2

  • M1000e Blade server Enclosure
    • Redundant power supplies
    • Redundant Chassis Mgmnt Controllers
    • Two M8024-k Managed Switch, 24x10GbE Ports
    • 5yr Warranty
  • M620 blade servers
    • Dual port 10Gb daughter card
    • iDRAC7
    • Dual Xeon E5-2665 2.40GHz
      • Total 32 logical cores
    • 96GB 1333MHz RDIMM
    • 2 x 1TB 7200rpm disks in RAID-0
    • 5 yr Warranty
    • Performance Optimized

BNL

  • 91 Dell PowerEdge R620 Servers
  • Price not available.
  • 2 Intel Xeon E5-2660 (2.20 GHz) CPUs [8 physical (16 logical) cores ea.]
  • 64 GB 1600 MHz RAM (8x8 GB RDIMMs)
  • 8 2.5" 7.2K RPM 500 GB SATA 3 Gbps drives
  • H310 disk/RAID controller
    • RAID-0 used

Illinois

  • C8000 chassis/C8200 blades were used in new campus cluster.
    • Got about 1/3 of best speed, using poor power mgmnt and cpu efficiency settings.
    • Simple BIOS switching brought this back in line with equivalent machines.
  • 1TB SATA local,
    • Most IO over Infiniband to DDN storage system (GPFS).
  • Price not available

SLAC Recent quote

  • M1000E encloseure:
    • No details on warranty, Power Supplies, or internal NICs
    • dual-port 10Gb, 2x 500GB, infiniband,
  • M620 blade: E5-2260 (2.2Ghz),
    • 64GB RAM (1600Mhz),

Infiniband could be removed from quote, but it is not clear at what savings.

BIOS Settings

As noted by the Illinois experience, BIOS settings can be very important to the system performance.

  • AGLT2 specified "performance optimized" BIOS
  • BNL recommends the following for their configurations

The R410 had a a somewhat poor fan design, which made it necessary to modify the CPU performance and power management parameters a bit to keep the fans from spinning too fast and causing vibrational interference with the hard drives. During tests, the modifications had little impact on HS06 performance.

  • Logical Processor -- ENABLED
  • Number of cores per processr -- ALL
  • C1E? -- ENABLED
  • C-states -- ENABLED
  • AC Power Recover -- OFF
  • SAS 6/iR -- NO RAID, PASS THROUGH

The R420 (and R620s we're buying) have rubber feet on the fans in the system, which minimize vibration transfer to the chassis. Therefore, for these hosts we'll be setting System Profile to "Performance", which will enable Turbo Boost, and disable all C-states including C1E.

Testing IO Performance

in parallel and aggregate the sequential results.

Benchmarks

Target configurations for Dell to consider

Our sales rep tells me that there is not much interest within Dell for putting up a "standard" configuration machine set for USATLAS. However, he is pursuing quotes for the configurations below, and will bring them to us when they are ready. These can be used as a basis for approaching your own sites local rep for quotes.

  • R620
    • 2 x E5-2660 CPU
    • 4p x 1GbE LOM
    • 96GB Memory 1600MHz
    • 4 x 250GB HDD
    • iDRAC Express
    • NBD Support 5 x 10 – 5 years
    • Decline installation
    • No System Documentation, No OpenManage DVD Kit
    • No OS
    • Rack Rails

  • R620
    • 2 x E5-2665 CPU
    • 4p x 1GbE LOM
    • 96GB Memory 1600MHz
    • 4 x 250GB HDD
    • iDRAC Express
    • NBD Support 5 x 10 – 5 years
    • Decline installation
    • No System Documentation, No OpenManage DVD Kit
    • No OS
    • Rack Rails

  • R620
    • 2 x E5-2670 CPU
    • 4p x 1GbE LOM
    • 96GB Memory 1600MHz
    • 4 x 250GB HDD
    • iDRAC Express
    • NBD Support 5 x 10 – 5 years
    • Decline installation
    • No System Documentation, No OpenManage DVD Kit
    • No OS
    • Rack Rails

  • M620
    • 2 x E5-2660 CPU
    • 2p x 10GbE NDC
    • 96GB Memory 1600MHz
    • 2 x 1TB HDD
    • iDRAC Express
    • NBD Support 5 x 10 – 5 years
    • Decline installation
    • No System Documentation, No OpenManage DVD Kit
    • No OS

  • M620
    • 2 x E5-2665 CPU
    • 2p x 10GbE NDC
    • 96GB Memory 1600MHz
    • 2 x 1TB HDD
    • iDRAC Express
    • NBD Support 5 x 10 – 5 years
    • Decline installation
    • No System Documentation, No OpenManage DVD Kit
    • No OS

  • M620
    • 2 x E5-2670 CPU
    • 2p x 10GbE NDC
    • 96GB Memory 1600MHz
    • 2 x 1TB HDD
    • iDRAC Express
    • NBD Support 5 x 10 – 5 years
    • Decline installation
    • No System Documentation, No OpenManage DVD Kit
    • No OS

-- RobertBall - 14 Jun 2013

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments

 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback