r10 - 15 Apr 2011 - 18:25:53 - RobertGardnerYou are here: TWiki >  Admins Web > NetworkMonitoringP16

NetworkMonitoringP16

Part of the NetworkMonitoring activity in US ATLAS, Phase 16 (FY11Q2), c.f. SiteCertificationP16.


For this quarter there are two primary certification tasks:

  • Getting each site's perfSONAR instances properly configured and updated.
  • Remeasuring the BNL->Site throughput limit via a Loadtest

perfSONAR Configuration Goal

  • All sites should be running perfSONAR v3.2 and review and implement the recommendations in Jason Zurawski's document on perfSONAR maintenance guide.
  • Sites should also consider (re)installing as a disk-based install, rather than burning and booting from CDrom (see http://psps.perfsonar.net/toolkit/FAQs.html#Q34).
  • The Nagios server at BNL is testing each of our USATLAS Tier-1/Tier-2 perfSONAR instances and this will be used to determine when a site has complied with this goal.
  • Each site should have both the Latency and Throughput matrices completely "Green".
  • For example the current (March 6, 2011) matrices are shown here:

perfSONAR_thru_matrix_mar6.png

perfSONAR_lat_matrix_mar6.png

  • Based upon these results only BNL has green rows and columns for the Latency matrix (row/column 2).
  • A number of other sites are close but have at least one non-green box (which may not even be their site's problem) to resolve. For the throughput matrix only AGLT2_UM (row/column 1) qualifies though, again, a number of other sites are close.
  • Summarizing: to certify your site in the NetperfSONAR table you need to have all green rows AND columns in both the Latency and Throughput Nagios matrices.

Remeasure Throughput Baseline for each Tier-2

  • Each Tier-2 should contact Hiro and schedule a 1 hour Loadtest.
  • The goal is to achieve the maximum throughput possible from BNL to each site. This will indicate the expected upper-bound on transfers. Each site listed below should document the test results here. Our goal is an average of 400MB/sec (for 10GE connected sites).
  • Once a site has completed the tests and posted the results here they can check-off this on the SiteCertificationP16 table.

AGLT2

AGLT2 requested 4 sets of throughput tests spanning March 3 through March 4th. The final test results from Friday, March 4th, 2011 are shown here. First a graphic showing a number of Cacti graphs showing network and storage node activity. I put "red" arrows to denote the loadtest start. On the upper right plot I marked the approximate "incoming" traffic on our dCache storage nodes. Almost all this is from the loadtest. The average of the last hour is approximately 1 GByte/sec.

loadtest_cacti_tree_mar4_v3.png

The next plot shows Hiro's FTSmon results during the test. We started with 45 concurrent transfers (AGLT2 is normally 30) and ramped up to 100 by the end of the test (you can see the impact of changing the number of concurrent transfers in the above plots as well).

loadtest_bnl_aglt2_mar4_2011_v5.png

So AGLT2 retest results are 1GB/sec, completed on March 4, 2011.

MWT2_UC

This first plot shows the throughput on our link to campus, includes all bandwidth in/out of our site:

BNL-UC_load_test_2011-03-18_3.38.02_PM.png

This second plot shows one single s-node throughput:

Single S-node throughput at MWT2_UC from BNL

Finally, this is the BNL report for the MWT2_UC channel:

FTSMON_throughput_test_2011-03-18_at_3.55.27_PM.png

In summary, MWT2 retest results are 1GB/sec, completed on March 18, 2011.

MWT2_IU

This plot show throughput from BNL into Indiana.

  • Screen_shot_2011-04-14_at_3.13.11_PM.png:
    Screen_shot_2011-04-14_at_3.13.11_PM.png

MWT2 IllinoisHEP

This plot shows the throughput to IllinoisHEP from BNL

BNLDCACHE-UIUCSRM_throughput_2011_03_22.png

NET2

SWT2

UTA (SWT2_CPB)

Test performed April 14, 2011

transfertest.png

(UTA has a 2x1GE limit, OU should have 10GE; may want to test separately for each)

WT2

Forgot to keep the plots. Stable at ~450MB/s (BNL to SLAC)


-- RobertGardner - 18 Jan 2011

About This Site

Please note that this site is a content mirror of the BNL US ATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your US ATLAS computing account name and password.


Attachments


png BNLDCACHE-UIUCSRM_throughput_2011_03_22.png (30.9K) | DavidLesny, 05 Apr 2011 - 12:25 | Load Test Throughtput to IllinoisHEP? from BNL
png FTSMON_throughput_test_2011-03-18_at_3.55.27_PM.png (135.9K) | AaronvanMeerten, 30 Mar 2011 - 14:14 | FTS Monitor for MWT2_UC from BNL
png uct2-s9_network_2011-03-18_at_3.50.47_PM.png (87.4K) | AaronvanMeerten, 30 Mar 2011 - 14:13 | Load Test Throughput to one storage node at MWT2_UC from BNL
png BNL-UC_load_test_2011-03-18_3.38.02_PM.png (78.5K) | AaronvanMeerten, 30 Mar 2011 - 14:13 | Load Test Throughput to MWT2_UC from BNL
png loadtest_bnl_aglt2_mar4_2011_v5.png (50.9K) | ShawnMckee, 06 Mar 2011 - 15:27 | Hiro FTSmon plot showing loadtest results to AGLT2
png loadtest_cacti_tree_mar4_v3.png (277.3K) | ShawnMckee, 06 Mar 2011 - 15:18 | Cacti plots for AGLT2 with arrows showing loadtest start
png perfSONAR_lat_matrix_mar6.png (39.3K) | ShawnMckee, 06 Mar 2011 - 14:59 | Nagios perfSONAR latency matrix March 6, 2011
png perfSONAR_thru_matrix_mar6.png (42.1K) | ShawnMckee, 06 Mar 2011 - 14:58 | Nagios perfSONAR throughput matrix March 6, 2011
docx 20110201-USATLAS-pSPT.docx (334.4K) | ShawnMckee, 06 Mar 2011 - 14:41 | perfSONAR maintenance document from Jason Zurawski
png transfertest.png (4.5K) | PatrickMcGuigan, 15 Apr 2011 - 03:16 | UTA mrtg graph for testing period
png Screen_shot_2011-04-14_at_3.13.11_PM.png (91.9K) | RobertGardner, 15 Apr 2011 - 18:24 |
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback