Subject: Re: storing a track to the database... From: David Adams Date: Mon, 22 Oct 2001 10:10:10 -0400 To: john.baines@rl.ac.uk CC: David Rousseau , atlas-database@atlas-lb.cern.ch, reconstruction list John: In my Data history model (http://www.usatlas.bnl.gov/~dladams/data_history) and in my previous note I have tried to stay away from discussion of implementation. I have tried to state requirements and have included some discussion of design. Definitions ----------- For clarity let me reintroduce some definitions. A physics data object (PDO) is a digit, cluster, track, electron, etc. An event data object is the product of an algorithm, our unit of storage and is typically a collection of one type of PDO. For example the track EDO would be a collection of reconstructed tracks (track PDO's). EDO indices allow EDO's to refer to parent EDO's. Parents are the EDO's from which the child EDO was constructed. An EDO carries a list of parent EDO indices. PDO indices allow PDO's to refer back to PDO's in parent EDO's. They might also allow reference to PDO's in the same collection (EDO). Design ------ I believe your comments are relevant to design issues I have only begun to address. You are suggesting that EDO and PDO indices be stored separately from the other data (e.g. results of fitting). I believe this allowed (but not required) by but the requirements and design I have presented so far. We have identified three types of data in an EDO: 1. history including parent EDO's 2. PDO associations 3. Derived PDO data (fits, etc.) We do not save space on tape by storing these data separately. In fact we will increase the total space because we must add bookkeeping to maintain the associations between them. However we can have smaller input streams when want to process one type of data without the others. You have raised an important design issue which I will add to my document. Events ------ You conclude with a comment about only needing unique identifiers within an event. This is something glossed over in my document. Event processing is distributed and it is difficult to assign unique ID's within event because an event might be simultaneously processed at different sites. We can restrict the definition of "event" to avoid disallow merging these data but I did not want to make this restriction. I believe it would be easier to assign unique ATLAS-wide identifiers and that is what I assumed in the "Object identifiers" design issue of my data history document. I should make the options and my choice explicit as a separate design issue. da ---- john.baines@rl.ac.uk wrote: Hi David, each child event data object (EDO) "points" back to its parents. I agree that we need this functionality, but an alternative implimentation would be to simply require, as you have suggested, that each object have an identifier and for some external service to provide the association. This removes the need for lists of identifiers inside data objects. For example, it could be implimented as follows: o When each new object is created the identifier for the object is obtained from a DataHistoryService. When requesting the identifier, the list of identifiers for the constituent objects (if any) are passed to the DataHistoryService as arguments of the call. eg. A track reconstruction algorithm gives a list of clusters to the DataHistoryService and receives an identifier for the new track in return In this way, the DataHistoryService has full knowledge of the associations and can answer queries such as "give me the clusters on this track" or "give me the tracks containing this cluster". The advantages are : 1) Smaller data objects 2) only need retrieve the history information if required 3) allows forward and backward navigation and could allow more complex requests like "give me the KINE contributing to this track". 4) Decouples the implimentation of the History mechanism from the objects themselves. For example, the DataHistoryService could be optimised for forward association, backward association or both. I guess the need for a separate service might be considered a disadvantage. An additional note on the subject of identifiers. I see that in : http://www.usatlas.bnl.gov/~dladams/data_history/identifier.html you calculate the required number of bits based on 1000 objects/event. If this is to include each digit and each cluster in an event, this seems a vast underestimate. The TRT has about 100,000 hits per event at high luminosity and the SCT & Pixels about 60,000. I think that for the history mechanism to work, the requirement is really only a unique identifier within an event, in that all the queries I can forsee are limited in scope to within an event. ie. the DataHistoryService could load a table of associations for the current event which would be used to answer all queries. Cheers John \ John Baines email : j.t.m.baines@rl.ac.uk / \ Rutherford Appleton Laboratory, / \ Chilton, Didcot, Oxon. OX11 OER. UK / \ Phone : [+44] (0)1235 44 6377 (direct) / \ 6733 (Fax) / \ 82 1900 (switchboard)/ On Fri, 19 Oct 2001, David Adams wrote: Date: Fri, 19 Oct 2001 10:20:52 -0400 From: David Adams To: David Rousseau Cc: atlas-database@atlas-lb.cern.ch, reconstruction list Subject: Re: storing a track to the database... I have two comments: one general comment about persistent pointers and one about tracks. I suppose the first is directed to the DB list and the second to the reco l;ist. Skip to the section that interests you. Persistent pointers ------------------ Clearly the problem here is much more general than tracks and clusters but this does provide a good example of the problem that one reconstructed object (track) needs to point back to the objects (clusters) from which it was created. Here I describe a simple strategy for keeping track of these associations. It easily fits into StoreGate, root, objectivity and can even allow events in one type of DB to point to thoise in another (although significant effort is required to dereference these pointers). An important component of the data history that I circulated earlier (http://www.usatlas.bnl.gov/~dladams/data_history) is that each child event data object (EDO) "points" back to its parents. In this case the child EDO is the track collection and the parent is the collection of clusters from the tracking subdetector. Here "points" means that each cluster EDO has an index (within the event or within all of ATLAS) and the track EDO holds that EDO index. The clusters within each cluster EDO are indexed (call these object indices). Tracks hold a list of these cluster object indices. A user of tracks finds the cluster collection using the EDO index and then can find the cluster within the collection using the cluster index. If all ATLAS data (or at least the data for this event) were in a common object DB (perhaps the same federation), the indices can be made to look like pointers. This can also be done under less restrictive conditions. To return to the original question, my guess is that the ESD would be an EDO whose parents include the track EDO. Each ESD track needs one object index to point back to its parent track. Tracks ------- I believe is is useful to disntinguish a reconstructed track from a track fit. By the latter, I mean the five independent track parameters and their error matrix at some specified surface or at the DCA (distance of closest approach). A reconstructed track is something that is capable of returning a track fit at any reachable surface. The essence of a reconstructed track is its list of clusters. A track fitter can then use a propagator (which needs the magnetic field and material description) to calculate the track fit at any surface. Having made this distincion it is then natural to define separate classes for the two concepts. TrackFit consists of a surface, a 5-vector and its 5x5 error matrix. RecoTrack consists of a list of clusters (more likely cluster pointers or indices) and may cache track fits at one or more surfaces. The latter data need not be persistent because it is derived from the clusters. In the question below I infer that the reconstructed track is stored in the AOD and a track fit is stored in the ESD. So my answer is yes, they are different classes. Those versed in the Kalman filter methodology will see an alternative to keeping the clusters: one can store the fits or smoothing data at each cluster surface. This is a lot more data but saves the need to refit. The propagator (including field and material) must also be recoverable if we want the fit at an arbitrary surface. If one is only interested in the track fit before the first measurment, then the track fit at that location is sufficient to specify the fit over that range. Now things get a bit fuzzy--over that range the list of clusters and the single fit are equivalent reprensentations of the reconstructed track. The same holds for the fit after the last measurement. These two fits are natural candidates for the ESD. da -- David Adams desk: 631-344-6049 Brookhaven National Lab fax: 631-344-5078 PAS group, Building 510A email: dladams@bnl.gov Upton, NY 11973-5000 http://www.usatlas.bnl.gov/~dladams ____________________________________________________________________ This mail has been sent to everyone on the atlas-database list ____________________________________________________________________