Out of jobs once again. However there is a new sample to fill the queues, but there problems with DQ2 callbacks. Expect scout jobs to finish quickly, then there would be 50K jobs.
Overall, 5M single particle events - short jobs - being defined now.
Unhappy about lack of planning and information dissemination.
Validation samples keep coming in - low quantity, but important jobs.
Re-processing: 3 issues discovered: Panda mover timeouts optimized for 2GB file jobs. dbrelease file is 4 GB - causing timeouts.. Tadashi increased. Will add modification to adjust to filesize. LCG-Utils was using lcg-ls (signed integer used, bug). Had max value of 3 GB. Newer release fixes the problem. Xin to check new OSG worker-node client LCG-Utils w/ Wensheng. Bamboo starvation and site specific scheduling.
Rod asked to define a task for reprocessing at Tier2 sites.
Michael suggests we summarize the experience and make available to ATLAS.
What about Oracle database access? The BNL oracle instance will be used. What is the fall back? Use Triumf's instance. This is written into the transformation.
Autopilot ready for MWT2 - end of the week expect.
22441 - large number of failed jobs, don't worry.
Heavy Ion jobs - 'looping job killed by pilot' - was this issue resolved? Kaushik believes HI tasks redefined by Pavel increased to 200K. For dedicated sites, keep queues with very long wall-time limits.
Analysis queues, FDR analysis (Nurcan)
Expecting increase in activity w/ FDR2 data - w/ release 14.
Have requested a validation data sample to be replicated. Done. Will do validation.
Helping with tutorial at Vancouver workshop. Working with Akira.
Action item - all sites need to re-run configure-osg with sub-cluster information
Information is picked up overnight.
SRM v2 and Space Tokens
Follow-up:
OU - will not update to SRM v2.2 until new storage arrives.
Which roles should space tokens support. Role usatlas production vs atlas production. Two roles? And the mappings are different. Is there only one binding between the attribute and the space token?
Note - jobs are being defined w/ space tokens.
Enable multiple roles in the certificate?
Unify all production with the simple atlas production.
Pilot upgrade for space tokens
AGLT2 has a site setup; SE - ATLASMCDISK, and an SE path. In contact with Paul. SE prod path.
ATLASENDUSER disk also included. (At AGLT2, put 18 TB)
Almost up and running - off an Oracle cluster backend.
Then Hiro will test an LRC migration script.
Then will decide on an exact migration path. Lazy migration.
Don't expect any issues with Panda integration.
We need to comet to a formal decision about our own deployment model. If we stick to our existing model, we should prepare arguments, and the converse. Fault tolerance and scalability issues. Suggestion is to revisit.
Looking at the pledges - we're short by about 20% in summary. But at specific sites there are severe shortcomings. And 2008-2009 we need to double 1.5 to 2.5 PB, a significant growth.
5 - 6.3 MSI2K? a minor step, while storage by factor of 2.
Kaushik should give guidance about what production and analysis will need.
Jim has given requirements as well. Will start from those numbers.
Deployed by September 15
Implies need to go out for bids in July.
Action item next week for rough guidance.
Want to be finished with this by end of June - technology and how much
Need specifications from Internet2 for network monitoring hosts. Encourage Rich to get these specifications by next Rich.
OSG 1.0
Expect release next week - some early testing at MWT2 in advance of release.
Met this past Monday - waiting on upgraded doors at BNL by end of the month. Expect 1 Gpbs to all Tier2s in parallel. 200/400 individually.
Hiro's tests can be made on-demand.
Jay: to include tests from multiple doors.
Most of the old doors will be replaced.
Nagios monitoring subcommittee (Dantong)
WT2 and SWT2 will be reporting available space.
Tomasz organizing a meeting to test globus-job-run.
Release installation via Pacballs (Xin)
Follow-up
Progress - this morning to discuss this. Fred - hoping this week to have first set of pacballs installed in DQ2. Will test with some older releases on some test machines.
Need official naming scheme.
Get installed with a special Panda pilot job using the software role. Expect performance to improve.
Expect a couple of weeks of testing.
Goal to bring into production by end of the month.
Site news and issues (all sites)
T1: lots of activities last week regarding FDR preparation, mixing jobs, and a group studying triggers. Busy deploying storage and network infrastructure (foundry core, 2 force 10s) for connection to 10G thumpers. Expect farm extension this week 3M SI2K? .
AGLT2: Busy with FDR2 calibration work. NFS-lock problems with SQL-lite databases. Site issue? Need to follow-up with ATLAS on this file locking.
NET2: All is well.
MWT2: All is well.
SWT2 (UTA): All is well.
SWT2 (OU): Got replacement server for gatekeeper, to be installed. 10G switch to be connected still.
Please note that this site is a content mirror of the BNL USATLAS TWiki. To edit the content of this page, click the Edit this page button at the top of the page and log in with your BNL USATLAS account.