Subject: Re: Some comments on HES From: David Adams Date: Wed, 27 Mar 2002 11:42:36 -0500 To: Solveig Albrand CC: Atlas Database Group Solveig: Thanks for your comments. Here are a few responses: 1. The document aims to provide both requirements and a high-level design. We tried to separate these in the text. The advantage of this approach is that the design can be understood without duplicating text from and referencing to a separate design document. The requirements are also clarified by an implementation serving as an example. Finally carrying out a design uncovers flaws in our requirements. If we define a review process that separates requirements and design, then we can go back and extract the requirements from our document. 2. We have tried to separate out components that can be generic, i.e. independent of other ATLAS software (in particular Athena/StoreGate). We retain some ATLAS concepts (type-keys, placement/sharing categories). Providing a separate design and implementation of these pieces allows us to create a piece of software that can be shared with others who share these concepts without requiring them to use ATLAS software. I believe this is good software design. 3. I will go back and look at the definitions you found confusing in the glossary. Our definitions have evolved with time and it is very useful to have someone read them without that background. 4. A common agreement on definitions would be useful. This might be useful for or ahead of the upcoming workshop. I like the idea that a document carries exactly the set of definitions require for understanding the following text and so would probably want to keep the relevant definitions as part of the HES document. I agree these must be consistent with any common definitions and might be identical. 5. The ADB document separates placement and sharing categories. HES merges the two (as agreed at an ANL meeting last fall). The HES placement categories also define the level at which sharing is possible. I agree that sharing is of more interest to the average user than placement. 6. File ID's are introduced to have a compact representation for references. Specifically we believe 64 bits will do the job. If we go to an ATM-like identifier, then we might as well stick with strings. 7. I agree that HES does not provide much detail about design of the cataloging. We have identified the fields that we feel are essential but I expect these will only be a subset of those in a more advanced design. I expect the latter to take place in contexts outside the HES document. Thanks again for your comments. da Solveig Albrand wrote: Here, at last, are a few comments on this document. I'm sorry to be so late, and I admit that I have not made it to the end yet. If this is part of the problem, or part of my problem is debatable. I do find the whole thing to be so extremely complicated (well before Luc says it gets too complicated for him!) that I have a sort of gut or aesthetic reaction, that what appears so complicated is not yet correctly described. My comments are much less detailed then Luc's. I do not feel competent to comment on the overall architecture, but I guess that one or two comments may slip through anyway. I can however say a few things on the structure of the document , and also on the part which deals with file cataloguing. Some comments on the document itself. First a general, almost metaphysical problem that I have with this document. From the beginning paragraph we get the impression (because that's what you say) that you are suggesting a realization (i.e. making realistic) of the architecture described in the Atlas document written by Ed Frank et al. But at the end we read that in fact you are actually aiming to produce something generic which will need something called an AHES to make it work with Atlas. So are you describing an "implementation" of a particular design, or using a particular design to give you ideas for a generic design? Is it a design document at all, because there are also "requirements" and "use cases". Of couse design should satisfy use cases, but I often had the impression that the use cases were derived from the design. Those on page 8 for example " A user wishes to find all the files containing data for a particular placement category......" "A user wishes to find all the data for a particular EDO type-key......" There are some funny users about I said to myself. Probably I am missing something fundamental but I thought that users should not have to worry about placement at all. They just worry about category. If you in your design / implementation decide that all data which belongs to the same category needs to be placed together that's OK by me, but it seems to me to be a plumbing detail. The user needs to know where the bathroom is, but not how the drains work. He just hopes that the architect and the builders have done their job. If the user needs to tell you which categories need to be shared, then that's another problem, and I don't see why you dropped the idea of sharing categories. In view of the comments from Luc and your answers to them, placement categories seem to be getting rather too esoteric for my liking. However, having said that, the document is much better structured than some. It begins with a glossary, which is an excellent thing, and all technical documents should do so. But the definitions in a glossary should not turn into a discussion of what the definition should be. I was particularly frustrated by the part on EDO. Thus, for "Event" I read "It CAN refer to a beam crossing or to a subset data associated with a particular crossing. The latter might......" I took this as meaning " WE WILL USE the word event to refer to a beam crossing or to a subset (of) data........" When I got to EDO I read "We refer to these collections of objects as event data objects..." This means to me that an EDO IS a collection. I was puzzled because I had previously thought that by EDO you meant the objects themselves and not the collections. Taking this with what is written just above, ".. it is natural to describe the data from a particular beam crossing as a collection of objects" I said to myself, that I was wrong and that in fact you were using EDO as a synonym for "event". In a previous paper written by David Adams (dated 08 oct 01) I had read for EDO "This is a data object that is part of the reconstruction of a particular event". Well I think I see what you mean, and it does become clearer when reading further, but I needed more than the glossary and I don't think I should have done. I also had the impression that your definition of Dataset was missing a paragraph. It jumps straight from real data to merging data from different production streams. SO - I have a suggestion to make. We (Grenoble) are also struggling with a glossary for our stuff, and it seems to me very important that we all mean the same thing by the terms we use. I think that we should produce a "Data Management Dictionary" which all subsequent documents could use as a reference document. The document should have all the terms you have in your glossary, plus a few others maybe, and even some UML diagrams to show what the relations are between them. Where neccessary a term could have a list of synonyms attached. We should do the same thing for Logical File names. A first document was already circulated, but it is probably time for a revision. Do we need a FileID also ? As some people probably remember, my own particular choice was to use a 14 digit hierarchically coded integer, like an International Bank account ID number or a (French) Social Security number. But I was not in the majority in meetings which discussed this. In any case we will probably have to do in the end what the grid people are proposing, and as I understand it, they may hand us a number, and we may want to assign our LFN strings to their numbers. So wait and see....... Now onto file content cataloguing. We've been thinking about this for quite a long time here in Grenoble, and have some prototypes (Tag Collector after all is a 'Production Environment Database" in you terminology isn't it ?) We are currently trying to produce a complete design document ourselves. (So I do sympathize greatly with all the problems of being precise in language, and with not mixing design and implementation). We have obviously got further than you on this matter. What I really would have liked to see in this section of your document is an enumeration of the functions a File Content which your HES design requires. Instead I read under "Design". ".....The file content catalog can be IMPLEMENTED as follows......." I say that what you have written is neither design nor implementation. Treat this (and probably MAGDA also) as a "constraint" i.e. a component that HES must interact with but cannot modify, or (the collaborative way and much better for everyone) as a "dependency" i.e. a component whose behaviour will be modified by interaction with HES, and say what HES needs. A bientot, -- Solveig -- David Adams desk: 631-344-6049 Brookhaven National Lab fax: 631-344-5078 PAS group, Building 510A email: dladams@bnl.gov Upton, NY 11973-5000 http://www.usatlas.bnl.gov/~dladams ____________________________________________________________________ This mail has been sent to everyone on the atlas-database list ____________________________________________________________________