Overview

Distributed Systems are a norm these days. Sometimes there is a requirement to generate Unique IDs across such distributed systems. Some of these unique ID requirements could be

There is Order Management System in the company and each of the order needs to have a unique ID

Each of the users stored in the system needs to have a unique ID

If we take an example of Instagram then there could be billions of status updates or posts from the user. Each of the posts will require a different unique ID.

……

Types of Unique IDs

Also, there are three types of IDs that could be generated when it comes to a unique ID generator

Unique ID ( Loosely Sortable)

Unique ID (Not Loosely Sortable) or Random Unique ID

Sequence Number

Let’s look at the definition of each of these first, then we will look into each Unique ID in detail

Unique ID ( Loosely Sortable)

These IDs are loosely time sortable and can be generated in a decentralized manner. There are several ways of generating as we will see in this article. These ids have some time components in them usually epoch time.

Examples include UUID (128 bit), MongoDB Object IDs (96 bit), 64 bit IDs. Since these IDs are time sortable, hence it can be used by applications to organize data.

Example of UUID (128 bit)

815b15597e8a4a4d4302d73e4682f4fc
442bc58166b6ab626ceed57f51982474
442bc58166b6ab626ceed57f51982474

Unique IDs ( Not time sortable) or Random UUID

These are generally sort IDs. Generally 6 or 7 or 8 character length IDs. For example, there is a short string that is used for generating a short URL. These short URLs could be used by websites like bitly.com or even youtube. Youtube uses these short URLs for videos so that they could be shared easily. So these IDs don’t have a use case of sort per se. These IDs could be large as well with no inherent information related to ordering.

One example of a large random ID could be let’s say a reference id or transaction id provided when someone does a payment at a payment gateway. These IDs have a use case of being totally random so that there is no possibility of guessing any reference id or transaction id.

There could be many more such requirements in which we need to generate billions of records.

Example of 7 character IDs

ETL2W7q
aTyUW89
bSZVK7I

Sequence Number

As the name suggests is an auto-increment number. When generating sequence numbers in a distributed system, it requires knowledge of other workers in the system. And as such it requires a shared state. It is very difficult to generate a sequence number in a distributed system in a high scalable manner as it requires some central management.

An example of a sequence number is MY SQL Auto Increment Number

Example

1
2
3
.
.
.
n

We will first look at the description and design of

Unique ID ( Loosely Sortable)

Followed by Unique ID (Not Loosely Sortable) or Random Unique ID

And in the end, we will talk about Sequence Number Generation

Unique ID ( Loosely Sortable)

Some of the requirements of a unique ID generation system could be.

The generated unique IDs should be short

The generated IDs should be time sortable

It should be possible to generate 10000 IDs per second in an available manner

Some of the unique IDs which could be generated in our system are

UUID or GUID which is 128 bits in length

96 bit UUID. An example is Mongo DB Object ID

64-bit length UUID

56-bit length UUID

UUID or GUID

UUID stands for Universally Unique Identifiers. It is a 16 byte or 128-bit ID which is guaranteed to be unique.

Advantages of UUID or GUID

No special separate system is required for the generation of these IDs

Generation of these IDs is fast

These IDs are time sortable

Disadvantages

The major disadvantage of GUID or UUID is its size

There could be collisions but chances are pretty less

96 bit GUID

Mongo DB Object is of 12 bytes or 96 bits. Each of the IDs comprises of

A 4-byte timestamp value. It is the number of seconds since the epoch time

A 5-byte random value

A 3-byte increment counter

This Object ID is unique per shard in the Mongo DB world. This means that two different Object IDs generated by two different shards could be the same.

Advantages

The ID generated is time sortable

Disadvantages

The size is still large which could make the indexes large.

So these IDs cannot be used where IDs less than 64 bit are desirable

64 bit IDs

The 64-bit IDs could be generated which are time sorted. And since it is 64 bit it is obviously smaller in size as well.

For 64-bit IDS we have two approaches to consider here

Snowflake by Twitter

There is a snowflake by Twitter that has been created to generate 64-bit unique IDs. It uses three criteria to determine the unique ID

Current timestamp in milliseconds – It is 41 bits and based upon the Epoch system it gives us 69 years

The sequence number of the instance generating the unique ID – It is 10 bit and hence it allows 2^10 = 1024 instances.

The sequence number of the thread within that instance which is generating the unique ID – Is 12 bit which allows 2^12 = 4096 per instance.

1 bit is reserved for future use.

Let’s see how this system is scalable. There could be 1024 instances so there is no single point of failure. With 1024 instances the system now has a lot more nodes to handle the traffic

Disadvantages

It requires the zookeeper to manage the mapping of the machine

Require several snowflake servers which is an additional level of complexity and management

Instagram ID generator

It also generates a 64-bit ID. Each of the IDs consists of

41 bits for the timestamp in milliseconds. If we have a custom epoch it can give us 69 years

13 bits which represents the logical shard ID

10 bits which represents the sequence number per shard.

So essentially 1024 IDs can be generated per shard per millisecond.

The sharded system has many thousand logical shards that are mapped to a few physical machines. Since it uses logical shards hence we can start with fewer physical servers which contain many logical shards. Once your shards get bigger, we can add more physical servers and move these shards to the new physical servers. Just the shard has to be moved to a new physical server. There is no re-bucketing of data that needs to happen.

Instagram ID generator doesn’t require a zookeeper which was needed in Twitter snowflake. It uses the Postgress Schema feature for administration

What is Postgress Architecture

A Postgress DB consists of multiple schemas. Each of the schemas consists of multiple tables. A table name must only be unique across a schema only. So in the case of Instagram, each logical shard is mapped to one schema, and the table inside the schema is used to generate the IDs

Unique IDs ( Not time sortable) or Random UUID

Examples of such IDs

6 pr 7 or 8 digit length which is 36, 42, and 48-bit length respectively

16 character length ID

32 character length ID

6 or 7 or 8 length digit base 64 encoded digit

Since it is base 64 encoded, hence each of the digits is 6 bit.

So if the unique ID is of size 6 then the overall size will be 7*6=42 bits.

So if the unique ID is of size 7 then the overall size will be 7*6=42 bits.

If the unique ID is of the size 8 then the overall size will be 8*6=48 bits.

Some of the use cases of using such an ID is

URL shortener system where the required URL is needed to be as short as possible

Paste Bin type systems were again required paste bin generated needs to be short.

Youtube requires such IDs to generate short URLs for videos

Advantages

Very small in size hence suitable for sharing

Disadvantages

These IDs generated are not time sorted

16 or 32 bit Random IDs

The character range for such IDs could be over 100 ASCII characters. And since it uses a large character set hence generation of these IDs is pretty simple and there are fewer chances of collision.

Examples of 16-bit Random IDs

PO2bc58166b6ab62
E#5B15597e8a$a4$

Examples of 32-bit Random IDs

PO2bc58166b6ab626ceed57f5198247$
E#5B15597e8a$a4$YU02d73e4682f4FC

Sequence Number

These IDs could be generated by using a database that could give us auto-increment numbers. Since there is a database that is generating unique IDs therefore it is guaranteed to be unique

Unique IDs in sequential order. For example, the database generates a unique incremented ID.

Advantages

Short increment IDs could be generated

It is easily time sorted

Disadvantages

Unique generated IDs could be of any length

The system is not scalable and the instance which is generating the unique ID is a single point of failure

To improve the scalability and prevent a single point of failure we could increase the number of instances. Each of the instances will increment in a different way.

For example, if we have two instances then one will generate an even number and the other will generate an odd number

If we have three instances then the first one will generate IDs in multiple of 3, the second will generate in multiples of 3 plus 1 and the third one will generate in multiples of 3 plus 2

Even after increasing the number of instances, there are a couple of disadvantages.

It is no longer a sequence number

The unique IDs generated by two different instances will no longer be time sorted. For eg let’s say there are two instances and one instance could be generating IDs like 1001, or 1003. The other could be generating IDs like 502,506 … at the same time. Obviously, by looking at two IDs it is difficult to tell which came earlier.

What if the traffic is huge and we have to add more instances. Similarly, let’s say the traffic is less and we have to decrease some instances. Increasing and decreasing instances could involve changes in the logic of the generation of IDs for each of the instances and managing such things is complicated as well as difficult.

Flickr unique ID generator uses the above approach.

High-Level Design of Time Sortable Unique IDs

Some of the non-functional requirements of the system is that it should be highly scalable and available

API Details

There will be only a single API needed in our system. This API will be used to get a bunch of unique IDs generated. So essentially it will return an array of Unique IDs

We will generate an ID that will have the below fields

Time Part

Machine Number

Process or Thread Number

Local Counter

Here is the description of each of these fields

Time Part – It denotes the time component of the Unique ID. Adding a time component to the unique IDs makes it time sortable.

Machine Number – This is a unique number of the machine or instance or container.

Thread Number – Unique number assigned to each thread

Local Counter – This is the number of unique IDs that can be generated by the thread in one millisecond

We will have a 64-bit ID. With 64-bit IDs we will be able to generate 2^64 IDs. If our requirement is 100 million IDs per

Time Component

The number of bits out of 64 for the time component will depend upon the lifespan of our application. The timestamp will be number of milliseconds starting EPOCH time. Also note that the EPOCH timestamp starts from January 1st, 1970. But for our application can have a custom epoch timestamp starting on 1 Jan 2022 or any other date which corresponds to the start of your application.

Assume the lifespan of our application is 50 years. Our unique ID will have a millisecond component in it. So the number of milliseconds in 50 years is

50*365*24*60*60 = 1577000000000 milliseconds

The number of bits needed to store a number that big is 41. Basically, 41 bits because

2^40 < 1577000000000 < 2^41

In fact here is the table that tells how many bits for timestamp define how many years.

Number of Bits	Max Binary Number	Number of milliseconds	Number of years
40 Bits	1111111111111111111111111111111111111111	1099511627979	34.8 years
41 Bits	11111111111111111111111111111111111111111	2199023255755	69.7 years
42	111111111111111111111111111111111111111111	4398046511307	139.4 years

Let’s see how many bits we need for each component.

Time Component – 41 bits

Machine Component – 10 bits – Max 2^10 = 1024 instances or machines or containers

Thread Component – 3 . Max 2^3 threads per instance or machine or container

Local Counter- 10. Max 2^10 unique IDS per millisecond per container.

Let’s check the capacity of our service

Max Instances – 2^10 = 1024

Number of threads per instance – 2^3 = 8

Max Number of threads possible – 2^10*2^3 = 2^13 = 8192

Unique IDs per thread per millisecond = 2^10 = 1024

Number of Unique IDs possible per millisecond = 2^13*2^10 = 2^23 = 8,388,608 = roughly 8 million per millisecond

Number of Unique IDs possible per second = 2^23*1000 = 8,388,608,000 = 8 billion Unique IDs per second

So theoretically this system could generate 8 billion Unique IDS per second for a total of 69.7. years.. We can also have a system that generates 56 bit IDs. Let’s see how many bits we need for each component

Time Component – 41 bits

Machine Component – 8 bits – Max 2^8 = 256 instances or machines or containers

Thread Component – 2 bits. Max 2^2 = 4 threads per instance or machine or container

Local Counter- 4 bits. Max 2^4 unique IDS per millisecond per container.

Let’s check the capacity of our service

Max Instances – 2^10 = 1024

Number of threads per instance – 2^2 = 4

Max Number of threads possible – 2^10*2^2 = 2^12 = 4096

Unique IDs per thread per millisecond (Local Counter) = 2^4 = 16

Number of Unique IDs possible per millisecond = 2^12*2^4 = 2^16 = 65,536 = roughly 65K per millisecond

Number of Unique IDs possible per second = 2^16*1000 = 65,536,000 = 65 million Unique IDs per second

So theoretically this system could generate 65 million Unique IDS per second for a total of 69.7. years

Can we choose a very long timestamp such that the timestamp is 10000 years or even more?

Choosing a very long timestamp will require a larger number of bits for the timestamp. That will limit the number of workers or threads or local Counter. This will in turn limit the capability to generate a number of Unique IDs per second.

How do we assign a unique number to each of the machines or the instances in our system?

We can use zookeeper here. Whenever any new instance or machine or a container is coming up it can register itself with the zookeeper and get a unique number from the zookeeper

High-Level Design

There will be a zookeeper service

There will be a load balancer behind which there will be a number of instances

Each instance when it comes up, it registers itself with the zookeeper. When it registers, it obtains a unique number from the zookeeper. This unique number varies from 1 to 1024 since we have only 10 bits for the instance number.

Autoscaling of the instances will be done based on CPU load. There will be a supervisor to which every node in the cluster will send its CPU. The supervisor is going to add and remove the instances or machines from the cluster depending upon the CPU load. If we are using AWS then it is simple to set up autoscaling

Below is the high-level diagram for the same

High-Level Design of Unique IDs ( Not time sortable) or Random UUID

For this, we will only look at the design of 6,7, and 8 character length random UUID. Generating a 16 length or 32 lengths random UUID is straightforward as there is no chance of collision.

Below will be the high-level components in the system for generating 6,7, and 8 character length random UUID

Key Generation Service – This service holds the responsibility of generating of short keys

Upstream Service – This is the business service that uses all the generated short keys for business purposes

Key Recovery Service – This service will be a worker that will recover the expired keys and put them back into the database for future use

Kafka/SNS+ Queue/SQS System

Database

Cache

So Key Generation Service or KGS service will hold the responsibility of generating the short keys. First, let’s see what should be the length of each key. Possible options of length are 6,7,8. Only base64 URL-safe characters could be used to generate the key. Below are URL safe characters

Lower case Alphabets – “a-z”
Uppercase Alphabets – “A-Z”
Digits – “0-9”
Underscore – “_”
Dash – “-“

Since only URL-safe characters can be used, therefore

For 6- We have 64^6= 68.7 billion options

For 7 – We have 64^7 = ~3500 Billion options

For 8 – We have 64^8= trillion options

We can now assume that 68.7 billion entries will be enough so we can have 6 characters for the key. Now the question is how these are going to be maintained in the Database. If we are storing 68 billion entries in the database then it might be too many entries and a waste of resources.

One option is to store a range of keys in the databases. We can have a range of 64 where we only store the first five characters. These five characters will act as a prefix for all 64 keys which can be generated from this prefix. Let’s say we have the below prefix

adcA2

Then below 64 keys can be generated from this

adcA2[a-z] – 26 keys

adcA2[A-Z] – 26 keys

adcA2[0-9] – 10 keys

adcA2[-_] – 2 keys

We can store these ranges in DB. So for 6 characters, we will have overall 64^5 entries in the database.

The keys will be returned by the Key Service to the Tiny URL services in ranges and batches only. The upstream service will then use this prefix to generate 64 keys and serve 64 different create tiny URL requests.

This is optimization as the upstream service only needs to call the Key Generation Service only when it has exhausted all its 64 keys. So there will be one call from the upstream service to the Key Generation Service for generating 64 short URLs

Let’s now see the points for the KGS service

Database Schema

Which database to use

How to resolve concurrency issues

How to recover key_prefix

What happens if the key ranges are getting exhausted

What if the short URL never expires

Is not KGS service a single point of failure?

Database Schema

There will just be a single table that will store the range of keys i.e prefix. Below will be the fields in the table

key_prefix
key_length – It will always be 6 for now. These fields exist if we need 7 length keys in any scenario
used – If this is true then the key prefix is currently in use. If false then it is free to be used
created
updated

Which Database to Use

We don’t have any ACID requirements so we can use the No SQL database. Also, we might have very large data to save as well so No SQL might be better suited. This system will be a write-heavy as well as a read-heavy system. So we can use Cassandra Database here.

We can do the capacity estimates of the DB and based on that we can decide on the number of shards we want to have. Each of the shards would be properly replicated as well

There is one more optimization we can do here to improve the latency. We can refill free key ranges in the cache and the KGS service can directly pick from there instead of going to the database every time.

How to resolve concurrency issues

It could very well happen that two requests see the same prefix or range as free. Since there are multiple servers reading from the key DB simultaneously we might have a scenario where two or more servers read the same key as free from the key DB.

There are two ways to resolve the concurrency issues we just mentioned

Two or more servers read the same key but only one server will be able to mark that key_prefix as used in the database. Concurrency is at DB level that is each row is locked before being updated and we can leverage that here. Db will return back to the server whether any record was updated or not. If the record was not updated then the server can fetch a new key. If the record was updated then that server has got the right key.

The other option is to use a transaction that does Find and Update in one transaction. Each Find and Update will return a unique key_prefix every time. This is probably not a recommended option because of the load it puts on the database

What happens if the key ranges are getting exhausted

This will be an unexpected condition. There will be a background worker that will check if the key ranges are getting exhausted. If yes then it can generate ranges for 7 length keys. But how it will know if the key ranges are getting exhausted. For keeping a rough count there could be another table that will store the user count of used keys.

Whenever any range is allotted by the KGS to the upstream service it will publish a message that will be picked by a synchronous worker that is going to decrease the count of used keys by 1.

Similarly, whenever a range is free we can increment this counter.

Is not KGS service a single point of failure?

To prevent it we will have proper replication of the key database. Also, there will be multiple app servers for the service itself. We will also have proper autoscaling set up. We can also have Disaster Management

Below is the high-level diagram of the overall system

Conclusion

This was all about unique ID generation Hoped you have liked this article. Please share feedback in the comments.