Subject: Re: event data history From: David Adams Date: Tue, 09 Oct 2001 08:46:08 -0400 To: torre@wenaus.com CC: Srini Rajagopalan , Hong Ma , Pavel Nevski , Yuri Fisyak , Alex Undrus , Valeri Fine , Wensheng Deng , David Adams Torre: Thanks for your ideas. I respond to them below. Torre Wenaus wrote: David Adams (dladams@bnl.gov) wrote on Monday 2001/10/8 18:06: >> Torre: >> > If a chain of algorithms connect the parent EDO's to their child, then I >> agree that entire chain must be specified. I did not mean to allow >> algorithm to be used in that sense. I have added algorithm to my >> definitions to rule out that possibility. > I think you should keep your use of terms as consistent as possible with > ATLAS convention to reduce confusion 1. I agree. I used algorithm because I thought it was consistent and I wanted to avoid introducing a new term. If there is already a term for my concept or if you have a suggestion for a new term, please let me know. The terminology for the "algorithm" is important because its specification drives the design. >> >> I disagree with your statement about the "prescriptive (re)generation i >> procedure". The input data and (fully specified) algorithm and >> environment specify the output. It shouldn't matter if the algorithm was >> run inside athena or inside some future or private framework. It also >> shouldn't matter if the algorithm was or was not in the same job as >> other algorithms. > Unless you record how data was generated in terms of the 'job script' > used to generate it, in addition to input data etc., you have neither > enough info to regenerate it nor (should another framework come along) > enough info to evaluate whether in fact there is no difference between > running an algorithm in one framework vs. another. 2. I agree the job is the thing that knows how the data is produced. Pointing back at the job description is a possible implementation of much of the requirements but I don't think we should make it a requirement that information be accessed in that manner. There should be a way to fetch the information needed to reproduce the EDO without explicit reference to the job. This does not preclude whatever returns this information from accessing the job. But I would like to allow in principle for other implementations and other types (non-athena) of jobs. Perhaps we should add and explicit reference to the job description to the "optional" information. What do you think? >> >> I do agree and meant to imply that job histories would be recorded. >> These provide a convenient (i.e. space-saving) way to record history >> which is common to multiple events. They are also essential for >> production bookkeeping. >> >> A job might know which EDO's it produced but I would not require an EDO >> to know which job produced it. This is essential if we allow for virtual >> data. The job which produced an EDO is not relevant if it could have >> been produced identically by any number or "virtual jobs". > I agree, except an EDO's history object will presumably have to reference > the job if part of its history is stored at the job level. 3. I almost agree. A job description could be implemented as a environment description and a collection of algorithm descriptions. Then a particular EDO only needs to reference the environment and its algorithm--not the entire collection. Again this is an implementation issue. I would first like to nail down the requirements and then move to design and implementation. >> >> I should be able to select all EDO's with a common specification >> (algorithm and environment) of ancestry. I shouldn't matter which >> particular job produced them. If I want only data from a particular job, >> I can go to the job and request the list of EDO's. > I agree >> I agree that we should add processing time to the list of "optional" >> data history items. I have done so. >> >> da -- David Adams desk: 631-344-6049 Brookhaven National Lab fax: 631-344-5078 PAS group, Building 510A email: dladams@bnl.gov Upton, NY 11973-5000 http://www.usatlas.bnl.gov/~dladams