The DSC is the primary user interface to datasets and it plays a role of what is often called a metadata catalog. It enable users to select a dataset based on its content, provenance and other metadata. This dataset is virtual, that is it need not have a unique mapping to any particular collection of logical or physical files. The data may not even exist (still or yet).
Virtual datasets are identified both by name and ID and the mapping between these is found in the DSC. The dataset corresponding to a given ID is normally immutable (although nonvirtual representations) may come and go). The mapping from name to ID may change, for example, as more data is acquired, a new dataset (with a new ID) might be formed by appending this data to an existing dataset and then reassigning the name from the old to the new dataset. A analyzer referencing a dataset by ID can expect to always get the same result while one referencing by name may see different results at differnt times. We expect most users to request datasets by name while a provenance system records ID's.
The DRC provides a mapping between a virtual dataset and one or more nonvirtual datasets where the latter are associated with a collection of logical or physical files or some other prescription for locating the referenced data. This mapping of virtual to nonvirtual datasets is analagous to the mapping of of logical to physical files in a file replica system such as RLS or Magda.
The DDB is a repository of dataset objects indexed by ID. At present, the datasets have an XML representation and are stored as files but it would also be possible to store these objects in another manner such as an XML database.
In many cases, a dataset will be formed from a collection of logical files and this can be sensibly done by creating one dataset for each file and then merging the resulting datasets. In this case we register the association between logical file names and dataset ID's in a DFC so that a future user can discover whether the file has already been used to create a dataset.