Subject: Re: History objects 0.1 From: Paolo Calafiura Date: Wed, 17 Oct 2001 14:13:29 -0700 To: David Adams CC: Srini Rajagopalan David, thanks for your comments: I am glad you have already found the time to go through our "design" and even to give us some useful feedback. David Adams wrote: > Paolo and Srini: > > I read through your document "History Objects" document (dated 16oct01, > version 0.1) and I have the following comments: > > 1. I think it would be useful to state use cases and requirements in > advance of your design. I would be glad to get more input and have you work from the requirements that I posted. while assembling the doc I had both your use cases and the DB architecture document open on my desk and I was constantly referring to them. I think the concept (it is not really a design) we put together reflects the content of both of them (that was the idea at least). Having said that I don't expect we'll be able to put together a formal design document with use case coverage and what not. Our approach so far, dictated by "client pressure" as much as anything else, has been to produce more throwaway prototypes than throwaway documents. History objects are no exception: we need a prototype out in 3.0.0 and presumably earlier if the db guys want to do anything with it for DC0. As soon as some brave guy will start using it and at that point a host of new use cases will emerge and we'll cycle... > > 2. You speak of an event history and show the event history pointing > back to its parent event history. This implies the existence of an > "event" consisting of all the data produced in a particular job for a > particular event ID. This event would consist of pointers to all its > data objects. The event history and event would reference one another > and might be combined into a single object. The event class should be > made explicit or the list of data objects added to the event history. > currently there is indeed an EventInfo object (the "header") which has (both in transient and persistent form) references to all the known dobjs. Clearly this approach does not scale to the experiment reqs (see also the DB arch doc) but so far it has served us well. > 3. A job can take output from two other jobs. I.e. a new event can be be > built from two input events. (All three events correspond to the same > event identifier and raw data). Your event history should allow for > multiple event history inputs. It should also allow for no history for > the case of raw data. good point > > 4. Your data object history omits pointers back to parent data objects. > I agree that we can regenerate the data by rerunning the original job > when the original events (i.e. all their data) exist. But I would like > the data to be reproducible at a finer granularity: we should be able to > reproduce a data object from its parents (or equivalents!) without > requiring that any other part of the events still be accessible. This > requires pointers back to the parent data objects or a way to generate > these pointers. Perhaps you can recover them using the job and event > history but it seems simplest and most reliable to keep them in the data > object history. If nothing else this provides an important check. If you > don't want to add these pointers, then you should clarify the implied > requirements for the job and event histories. I am not sure how you define the "parents". For me the parents of a data object are its producer Algorithms (or Algorithm in the simple cases). Once you configure an algorithm instance (as the DataObjectHistory would allow you) it is the algo itself which will take care of retrieving the "parent" dobjs. The use case we are trying to cover is one to (re)producing the dobj the history refers to. > > da > > P.S. If you don't mind, I would like to include a copy of your document > and future revisions on my data model page. sure, which reminds me that I have to links all these docs to the EDM page... Paolo