Subject: Re: Comments on the HES document From: David Adams Date: Mon, 22 Apr 2002 12:49:39 -0400 To: Ed Frank CC: Atlas Database Group Ed: I respond to some of your comments. It is true that during our extensive discussions of the HES design, we encountered many issues that were not addressed (or we did not understand as being addressed) in the ADB document. Scope ----- The scope of the HES document differs from that of the ADB document and I believe this accounts for some of the contradictions you perceive. I agree that physicists will often (but not always) want to express input and output in terms of event collections or datasets but I believe this can and be put in a layer above the event store. The scope of the HES document is narrower in that it is mostly restricted to the latter. There is some discussion of datasets because we cannot design the event store layer without acknowledging the layer above. Files ----- The HES document puts a great deal of emphasis on files because files are central to the HES design. While high-level users may often work with higher level concepts such as event collections or datasets, the implementation of HES must deal directly with files. In order for a job to run, input data must be gathered and this implies locating a collection of files continuing all the requisite data. Event collections ----------------- Before discussing event collections, we need to define events or more specifically event data. A piece of event data (an EDO in the HES document) has three important properties: 1. event ID (specifies the associated beam crossing) 2. content (in ATLAS this is the type and key) 3. version (of algorithm and parent EDO's) It is useful to think of these three properties as defining the coordinates in a three-dimensional space. When the ADB uses the word event, I believe it means 1 and 3. Event collections and datasets ------------------------------ I was hesitant to use the phrase "event collection" in the HES document because I suspected its meaning differed from what we mean by dataset. Discussion in last week's meeting finally clarified (at least for me) the distinction. An event collection is an explicit selection of event ID's with an implicit choice of version. A dataset is an event collection plus a restriction on content, e.g. just tracking data or just event summary data. Both concepts are important and should be kept distinct. When staging a job, it is the dataset which specifies which input files must be present (or, in general, whether the available files are sufficient for the job). It is not necessary to gather all data associated with the event collection. Streams ------- I agree that we generalized the definition of stream from the ADB document. In the ADB, the stream is associated with an event collection. In HES, we split streams and associate them instead with datasets. In last week's meeting, it was suggested that we find a new term and "channel" was suggested. I am willing to change but there may be confusion because the common phrase "physics channel" corresponds to a stream and not a channel in this sense. HES document ------------ I agree with idea of splitting the HES document although I might make somewhat different divisions than you suggest. In light of the decision to immediately join in with the CERN common project, I am not sure it is worth continuing with the separate HES design. Thanks for you comments. da -- David Adams desk: 631-344-6049 Brookhaven National Lab fax: 631-344-5078 PAS group, Building 510A email: dladams@bnl.gov Upton, NY 11973-5000 http://www.usatlas.bnl.gov/~dladams ____________________________________________________________________ This mail has been sent to everyone on the atlas-database list ____________________________________________________________________