Subject: Re: storing a track to the database... From: David Adams Date: Mon, 22 Oct 2001 09:06:48 -0400 To: john.baines@rl.ac.uk CC: David Rousseau , atlas-database@atlas-lb.cern.ch, reconstruction list John: There is some confusion about what is meant by "data object". I am not very comfortable with the term but I adopted it because it is used by the StoreGate authors. It defined as the unit of storage and refers to a collection of physics objects typically produced by an algorithm. Individual tracks, clusters or digits are not data objects in this sense. Instead data object refers to a collection of tracks or a collection of clusters. I use the notation EDO (event data object) to refer to such a collections and PDO (physics data object) to refer to the objects (digis, hits, tracks, electrons, ...) inside these collections. My estimate of of 1000 EDO's/event may be a factor of ten high or low but it is not off by too many orders of magnitude. I assume we agree that we want to impose the requirement that PDO's be able to point back the the PDO's used in their creation. For example a track carries a list of clusters from which it is created. I agree that it would be very expensive and unnecessary for each PDO to carry enough information to refer to any other ATLAS PDO. Instead I propose that the EDO contains indices specifying its list of parent EDO's. (This is consistent with the requirement that the list of parent EDO's be sufficient to recreate the child.) A PDO is restricted to referencing PDO's within these parents. The index that that a PDO must carry to reference another need only be large enough to specify the PDO within the appropriate EDO. I will call these PDO indices. (We may need to add a few more bits of information if the referenced parent EDO is ambiguous.) To me it seems natural to store the EDO associations (parent EDO's), PDO associations (PDO indices) and any other PDO data (fitted parameters) all in the child EDO. I will address you last comment about "events" in a separate note. da john.baines@rl.ac.uk wrote: Hi David, each child event data object (EDO) "points" back to its parents. I agree that we need this functionality, but an alternative implimentation would be to simply require, as you have suggested, that each object have an identifier and for some external service to provide the association. This removes the need for lists of identifiers inside data objects. For example, it could be implimented as follows: o When each new object is created the identifier for the object is obtained from a DataHistoryService. When requesting the identifier, the list of identifiers for the constituent objects (if any) are passed to the DataHistoryService as arguments of the call. eg. A track reconstruction algorithm gives a list of clusters to the DataHistoryService and receives an identifier for the new track in return In this way, the DataHistoryService has full knowledge of the associations and can answer queries such as "give me the clusters on this track" or "give me the tracks containing this cluster". The advantages are : 1) Smaller data objects 2) only need retrieve the history information if required 3) allows forward and backward navigation and could allow more complex requests like "give me the KINE contributing to this track". 4) Decouples the implimentation of the History mechanism from the objects themselves. For example, the DataHistoryService could be optimised for forward association, backward association or both. I guess the need for a separate service might be considered a disadvantage. An additional note on the subject of identifiers. I see that in : http://www.usatlas.bnl.gov/~dladams/data_history/identifier.html you calculate the required number of bits based on 1000 objects/event. If this is to include each digit and each cluster in an event, this seems a vast underestimate. The TRT has about 100,000 hits per event at high luminosity and the SCT & Pixels about 60,000. I think that for the history mechanism to work, the requirement is really only a unique identifier within an event, in that all the queries I can forsee are limited in scope to within an event. ie. the DataHistoryService could load a table of associations for the current event which would be used to answer all queries. Cheers John \ John Baines email : j.t.m.baines@rl.ac.uk / \ Rutherford Appleton Laboratory, / \ Chilton, Didcot, Oxon. OX11 OER. UK / \ Phone : [+44] (0)1235 44 6377 (direct) / \ 6733 (Fax) / \ 82 1900 (switchboard)/ On Fri, 19 Oct 2001, David Adams wrote: Date: Fri, 19 Oct 2001 10:20:52 -0400 From: David Adams To: David Rousseau Cc: atlas-database@atlas-lb.cern.ch, reconstruction list Subject: Re: storing a track to the database... I have two comments: one general comment about persistent pointers and one about tracks. I suppose the first is directed to the DB list and the second to the reco l;ist. Skip to the section that interests you. Persistent pointers ------------------ Clearly the problem here is much more general than tracks and clusters but this does provide a good example of the problem that one reconstructed object (track) needs to point back to the objects (clusters) from which it was created. Here I describe a simple strategy for keeping track of these associations. It easily fits into StoreGate, root, objectivity and can even allow events in one type of DB to point to thoise in another (although significant effort is required to dereference these pointers). An important component of the data history that I circulated earlier (http://www.usatlas.bnl.gov/~dladams/data_history) is that each child event data object (EDO) "points" back to its parents. In this case the child EDO is the track collection and the parent is the collection of clusters from the tracking subdetector. Here "points" means that each cluster EDO has an index (within the event or within all of ATLAS) and the track EDO holds that EDO index. The clusters within each cluster EDO are indexed (call these object indices). Tracks hold a list of these cluster object indices. A user of tracks finds the cluster collection using the EDO index and then can find the cluster within the collection using the cluster index. If all ATLAS data (or at least the data for this event) were in a common object DB (perhaps the same federation), the indices can be made to look like pointers. This can also be done under less restrictive conditions. To return to the original question, my guess is that the ESD would be an EDO whose parents include the track EDO. Each ESD track needs one object index to point back to its parent track. Tracks ------- I believe is is useful to disntinguish a reconstructed track from a track fit. By the latter, I mean the five independent track parameters and their error matrix at some specified surface or at the DCA (distance of closest approach). A reconstructed track is something that is capable of returning a track fit at any reachable surface. The essence of a reconstructed track is its list of clusters. A track fitter can then use a propagator (which needs the magnetic field and material description) to calculate the track fit at any surface. Having made this distincion it is then natural to define separate classes for the two concepts. TrackFit consists of a surface, a 5-vector and its 5x5 error matrix. RecoTrack consists of a list of clusters (more likely cluster pointers or indices) and may cache track fits at one or more surfaces. The latter data need not be persistent because it is derived from the clusters. In the question below I infer that the reconstructed track is stored in the AOD and a track fit is stored in the ESD. So my answer is yes, they are different classes. Those versed in the Kalman filter methodology will see an alternative to keeping the clusters: one can store the fits or smoothing data at each cluster surface. This is a lot more data but saves the need to refit. The propagator (including field and material) must also be recoverable if we want the fit at an arbitrary surface. If one is only interested in the track fit before the first measurment, then the track fit at that location is sufficient to specify the fit over that range. Now things get a bit fuzzy--over that range the list of clusters and the single fit are equivalent reprensentations of the reconstructed track. The same holds for the fit after the last measurement. These two fits are natural candidates for the ESD. da -- David Adams desk: 631-344-6049 Brookhaven National Lab fax: 631-344-5078 PAS group, Building 510A email: dladams@bnl.gov Upton, NY 11973-5000 http://www.usatlas.bnl.gov/~dladams ____________________________________________________________________ This mail has been sent to everyone on the atlas-database list ____________________________________________________________________