We need to put a structure with actions and timelines in place, Testing nd drawing conclusion based on the results becomes more and more urgent. Regarding data replication services the US becomes too special with undesirable exceptions, manual interventions etc.
Recent activity? Dantong: BNL hardware installation - smart switch - installed. Most data has been migrated into new LFC. There is also a delta upgrade.
We need a pricelist from Dell - another email sent. Some sites will have site agreements with Dell.
Internet2 monitoring host
UChicago_20080730.pdf: Quotation from Koi computers for perfsonar hosts - okay'd by Rich
Use consistent hardware.
Rich says 3-4 weeks a new release of Perfsonar will be available.
No objections to using Koi as a supplier.
Follow-up issues
Storage capacity recommendations/guidance for the Facility (320 TB capacity, from Kaushik's model on MinutesJune11).
Revised WLCG pledges - need info by July 15. Action item for Rob (not done!)
Operations overview: Production (Mark)
Many more jobs into system in the past few days - have had 5-6K jobs.
Weekly shift meeting (Xavi, at CERN) - some items:
Autopilot submissions slow? Help from Condor - has been difficult to troubleshoot since we've not had capacity. Has there been a scaling issue? Still an open question, will follow-up.
Checksum errors - mismatches caused jobs to fail. Complicated - who is responsible for the checksum and data integrity (panda or dq2, FTS?). Adler32 versus md5, still not resolved.
Some random side issues - most have been addressed, no major outages.
There were a large number of validation job failures - resolved.
Kaushik comment's: dccp can use the Adler32, but we're moving to lcg-cp.
lcg-cp - progress from Paul - will switch to PRODDISK at michigan tomorrow. Panda-dev server down for two days? pandadev02 is the server Tadashi uses - need this. From worker-node to the SE.
Want to see AGLT2 exercised for a week. Follow-up on this next week.
20-40 TB needed for PRODDISK. Start with 20 TB. Follow-up each Tier2 next week.
wn-client from OSG 1.0 needed for lcg-cp.
Issue - high number of re-try's at 25. Not necessary. This has been changed, it will be reduced.
Wei - reports there is a bug with lcg-cp with srm-bestman if file does not exist. Its not yet packaged in glite. Wei requests lcg-ls be used first to check file existence.
Shift report (Marco)
Downtime at BNL for Oracle database - responsible for backlog in file transfer
Esnet link problems at BNL? Probably not cause for backlog.
Analysis queues, FDR analysis (Nurcan)
A wiki page was setup to show the online/offline status of the US pathena analysis queues as well as their availability for various athena releases/packages, see the page at PathenaAnalysisQueues.
We have analysis workshop at the end of August - there will be a user-support session that Nurcan will present plans for US. Plan is to provide combined support for pathena and ganglia.
Preparing for 3-site Jamboree in September.
Usage has been light this month, though expect once users start again and update their pathena hosts which has brokering to other sites in the cloud.
Kaushik comments that the brokering seems to be working.
Collecting information about Panda releases to be used during workshop.
There is a problem with one of the probes having jobs go into the un-submitted state. Patrick will increase timeout to see if this helps. It is an intermittent problem at various sites.
OIM registration - Mark and Patrick are addressing this.
NET2 - now working properly. There was an issue with a single host providing both CE and SE.
Kaushik - its a lot of effort to install and support the software. Seems to be uncorrelated with what we need in ATLAS.
We still need to develop the standard for the frequency of running the probe.
Is there a problem with AGLT's calibration channel? Bob - believes there is a dCache bug causing this - SRM authentication fails and becomes root. Hiro says there is a work-around for this. Charles will send around a recipe.
Hiro is monitoring space-token enabled areas which checks the dashboard and sends email.
Please note that this site is a content mirror of the BNL USATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your BNL USATLAS account.