Data history: Global identifiers
Our requirements call for a unique global identifier to be assigned
to each event data object. Here we describe one possible implementation
as a proof of principle. No doubt there are more sophisticated schemes
but the following has the virtue of being simple.
The identifier must be unique, i.e. each value only assigned once.
Different machines around the world should be able to obtain identifiers
quickly without relying on a high-speed connection to a central server.
The identifier must have a compact persistent representation so the
objects do not became too large.
ATLAS acquires data at a rate of about 10**9 events/yr and can be
expected to run for 20 years. A factor 5 safety margin gives a
total of 10**11 events. If we allow for 1000 objects/event, we obtain
a total of 10**14 objects. This corresponds to 47 bits so we choose
an identifier size of 64 bits to allow room for wasted indices. This
gives 200,000 more indices than expected objects.
We need to distribute the indices in a manner that guarantees they are
unique but does not have a high latency associated with the assignment
of each value. A central source maintains a pool of index lists.
The values in different lists do not overlap. One choice would be to
maintain 2**40 (10**12) lists each containing 2**24 (17M) values.
Each computer requests one list from the central pool for each process
that it expects to run. Each process gains exclusive use of a list,
removes the indices as needed and then releases the list with its
A simple implementation would be to use a file for each list. The
file would contain a unique 40-bit list ID (assigned by the central
server) and the next 24-bit local index. The unique ATLAS identifier
is constructed by appending the index to the list ID. The index is
incremented each time an ID is assigned.
A process gains exclusive use by opening and locking the file and
releases the list by closing the file. The updated new index is
written to the file before closing.