This post will describe UUID v1, v4, and v5, with examples. We will go through their implementation and differences, and when you should use them.
Having a unique identifier is an important requirement in many applications today.
The “Universally unique identifier”, or UUID, was designed to provide a consistent format for any unique ID we use for our data. UUIDs address the problem of generating a unique ID - either randomly, or using some data as a seed.
However, ensuring uniqueness is a challenge in itself. How do you ensure that there is just one copy of the identifier you made, and no more? And even then, how do you make sure that there is no correlation between any two identifiers?
The answer is, you can’t do both. This is a tradeoff between uniqueness and randomness, and something that the different UUID versions solve in different ways.
UUIDs are just 128 bit pieces of data, that is displayed as (128/4) = 32 hexadecimal digits, like this :
At first glance UUID v1 and v4 look the same, but try regenerating them and the difference will be more apparent.
UUID v1 : 4e4bcff0-d1e5-11ec-bc39-9d44e5ef4fbf
UUID v4 : 4ac2bcd4-ef8a-4c9c-9bd7-707d12e90949
We’ll talk about v5 later
UUID v1 is generated by using a combination the host computers MAC address and the current date and time. In addition to this, it also introduces another random component just to be sure of its uniqueness.
This means you are guaranteed to get a completely unique ID, unless you generate it from the same computer, and at the exact same time. In that case, the chance of collision changes from impossible to very very small because of the random bits.
This guaranteed uniqueness comes at the cost of anonymity. Because UUID v1 takes the time and your MAC address into consideration, this also means that someone could potentially identify the time and place(i.e. computer) of creation. Try regenerating the UUIDs above, and you will see that some part of the UUID v1 is constant.
The generation of a v4 UUID is much simpler to comprehend. The bits that comprise a UUID v4 are generated randomly and with no inherent logic. Because of this, there is no way to identify information about the source by looking at the UUID.
However, there is now a chance that a UUID could be duplicated. The question is, do you need to worry about it?
The short answer is no. With the sheer number of possible combinations (2^128), it would be almost impossible to generate a duplicate unless you are generating trillions of IDs every second, for many years.
If your application is mission critical (for example, bank transactions or medical systems), you should still add a uniqueness constraint to avoid UUIDv4 collision
If you want a unique ID that’s not random, UUID v5 could be the right choice.
Unlike v1 or v4, UUID v5 is generated by providing two pieces of input information:
- Input string: Any string that can change in your application.
- Namespace: A fixed UUID used in combination with the input string to differentiate between UUIDs generated in different applications, and to prevent rainbow table hacks
These two pieces of information are converted to a UUID using the SHA1 hashing algorithm.
An important point to note is UUID v5 is consistent. This means that any given combination of input and namespace will result in the same UUID, every time. You can try it out yourself:
UUIDV5 = c2337a9a-4f41-5023-a3e3-6498891abe2f
This is great if you want to, for example, maintain a mapping of your users to their UUIDs without explicitly persisting that information to storage.
However, remember that these IDs are not random, and their uniqueness is now your responsibility.
If you don’t know what to go with, go with v4. It’s good enough, and the chances of collision are practically none.
If you actually want your UUID to give some indication of the date and computer in which it was created, then UUID v1 may be for you (although it is).
UUID v5 is normally used only for very specific use cases, when you want to derive a UUID from another piece of information on the fly.