Design an URL shortening service like bit.ly

Overview

The URL shortening service is used to shorten a long URL. The short URL will not be dependent upon the length of the actual URL. For

https://techbyexample/system-design/whatsapp-system-design should be shortened to https://bit.ly/sdfse43

https://google.com/search/ can be shortened to https://bit.ly/wrewr34

Notice the second part of the URL in bit.ly. It will always be of fixed length irrespective of the length of the input URL. Let’s note down the functional requirement of the system

You should be able to generate the short URL of fixed length irrespective of the length of the input URL.

Once the shortened URL is clicked it should be able to redirect the user to the actual URL.

There should be an expiration time with each of the URLs. After that, the shortened URL will expire

We are also going to consider the case where a short URL never expires later in this tutorial

Here are some of the non-functional requirements

The system should be highly available

The system should be strongly consistent. What it means is that once a short URL is generated for a given long URL then the system should be able to return that long URL given the short URL in the next immediate call.

The system should be fault-tolerant

There should be no single point of failure

What a short URL should look like? Before we look at the requirements, let’s look at a sample example URL. Here are some requirements

It must be short and unique. Preferably 6-8 characters in length

It should only have characters that are URL safe. Below are URL safe characters
- Lower case Alphabets – “a-z”
- Uppercase Alphabets – “A-Z”
- Digits – “0-9”
- Dot – “.”
- Ampersand – “~”
- Underscore – “_”
- Dash – “-“

Here is one example of a short URL that has 7 characters as part of the short string.

https://bit.ly/sdfse43

API Details

Below will be the API that will be needed in the system

Create short URL – This API will create a short URL given a long URL

Delete Short URL – This API will be used to delete a short URL that was created earlier.

GET URL – This API will redirect to the long URL given a short URL

List Short URL – See all Short URLs for a particular user

How Short URL will redirect to the Long URL

As soon as our service will receive the short URL it will fetch the corresponding long or original URL for that short URL. There are two cases

If not present it will respond with 404 Resource Not Found back to the browser

If present then it will issue an HTTP 302 redirect to the browser. The Location response header will contain the long or original URL. The browser will then redirect to the long URL. This is the standard way of redirection as specified by the HTTP protocol

One of the most important parts of implementing a short URL service is to generate a short URL string. Before looking at the solution let’s look at some of the encoding schemes that can be used to produce a short URL string

Encoding Schemes

There are three solutions to design a unique short character for a given long URL

Counter

Key Range

MD5 Approach

We will have an MD5 function that will return a unique string for a given long URL. The MD5 string is of 128-bit fixed length. We can take the first 42 bits to represent it as 7 base64 characters. Each character in base64 takes 6 bits that is why 6*7=42 bits overall. We can also take the first 48 bits to represent it as 8 base64 characters. How to prevent collision

As you can see that the biggest issue with the MD5 approach is a collision. Two long URLs can result in an MD5 that has the same first 42 bits. How we can detect collision then. We first do an MD5 of a given long URL then we take the first 42 bits to create the 7 bases 64 characters string. Then we check that string in DB to see if it has been already taken or not. If yes then we take the 2-43 bits and then check

Advantages

The biggest advantage of the MD5 approach is that it is simple to implement.

If there is a system in which only a few short URL needs to be generated and where the short URL generation is async then this scheme could work well as there will be fewer collisions and even if a collision happens we still have some time to figure out the next short URL since the overall process is async.

Disadvantages

As we can see this solution is not scalable. If we large number of short URLs is to be generated within seconds then the system could perform very poorly if there are a lot of collisions.

Counter Approach

As we have already seen there is a way to convert a base10 number to a base64 string. A counter is nothing but a base10 number. So if we have a way of generating a unique number for a given long URL then we will be able to generate a short URL. The question is how we can generate that unique counter or number. The idea is to have a free list of ranges of numbers. For that, we can have another service that can expose an API that will return the free range of numbers that can be returned.

Advantages:

It is a scalable option that can be used for any service which has huge traffic.

Disadvantages

Converting a base10 number to a base64 will not yield a fixed-length string. For example, converting 100,000 into base64 will give “Yag”. This might be ok but a professional URL shorter service generator should adopt a standard. Also, this disadvantage can be circumvented by choosing number ranges that give 6 or 7 lengths of Base64 strings.

There is a cost of conversion of number to base64 every time though this conversion cost might not have a substantial impact.

Little complex as compared to MD5

Range of Keys

The third option is to have a range of keys itself instead of a range of the counter.

Advantages:

It is a scalable option that can be used for any service which has huge traffic.

Disadvantages

Little complex as compared to MD5

Both Counter and Range of Keys are good options. We will be considering the Range of Keys in this tutorial

High-Level Design

On a high level let’s discuss what will be the higher flow and what all services would exist.

There will be an API gateway on which every request from all the users will land.

There will be a Short URL service. This service holds the responsibility of handling all the APIs that will come for the user.

There will be a Key Generation Service that holds the responsibility of generation of short keys

The Short URL service will call the key generation service whenever it needs new keys.

When the Short URL service has exhausted all the key ranges it will publish a Kafka/SNS message specifying that the key ranges have been exhausted.

This message will be picked by a Key Recovery service which will be a worker. It will mark the key range as free so that it can be picked again. This worker will also delete the short_url – long_url mapping for all the keys in the range from the database.

We will cache the Short URL – Long URL mapping as soon as it is created

Below is the high-level diagram of the service

Let’s see some of the details of the Short URL service and the Key Generation Service

Short URL Service

Let’s see what all databases will be needed for the short_url service. We will need a short_url table. Below will the fields in the short_url table

short_url_key

long_url

user_id

created

updated

What database to use

Since we don’t have any ACID requirements. So No SQL database will work for us. Also, we might have very large data to save as well so No SQL might be better suited. This system will be a write-heavy as well as a read-heavy system. Also, there is less secondary index requirement here and almost all queries will be on the primary field which is short_url_key. So we can use Cassandra Database here.

As mentioned in the high-level diagram it is the Short URL that is going to keep track of all short URLs generated. It will coordinate with the Key Generation Service for getting the range of free keys. Each of the keys can generate one short URL. Now let’s look at the design of the Key Generation Service

Key Generation Service

There will be a KGS service that holds the responsibility of generating the keys. First, let’s see what should be the length of each key. Possible options of length are 6,7,8. Only base64 URL-safe characters could be used to generate the key. Therefore

For 6- We have 64^6= 68.7 billion options

For 7 – We have 64^7 = ~3500 Billion options

For 8 – We have 64^8= trillion options

We can now assume that 68.7 billion entries will be enough so we can have 6 characters for the key. Now the question is how these are going to be maintained in the Database. If we are storing 68 billion entries in the database then it might be too many entries and a waste of resources. One option is to store ranges of keys in the databases. We can have a range of 64 where we only store the first five characters which will act as a prefix for all 64 keys which can be generated from this prefix. Let’s say we have below prefix

adcA2

Then below 64 keys can be generated from this

adcA2[a-z] – 26 keys

adcA2[A-Z] – 26 keys

adcA2[0-9] – 10 keys

adcA2[-_] – 2 keys

We can store these ranges in DB. So for 6 characters, we will have overall 64^5 entries in the database. The keys will be returned by the Key Service to the Tiny URL services in ranges and batches only. The Tiny URL service will then use this prefix to generate 64 keys and serve 64 different create tiny URL requests.

This is optimization as the Short URL service only needs to call the Key Generation Service only when it has exhausted all its 64 keys. So there will be one call from the Short URL service to the Key Generation Service for generating 64 short URLs

Let’s now see the points for the KGS service

Database Schema

Which database to use

How to resolve concurrency issues

How to recover key_prefix

What happens if the key ranges are getting exhausted

What if the short URL never expires

Is not KGS service a single point of failure?

Database Schema

There will just be a single table that will store the range of keys i.e prefix. Below will be the fields in the table

key_prefix

key_length – It will always be 6 for now. These fields exist if we need 7 length keys in any scenario

used – If this is true then the key prefix is currently in use. If false then it is free to be used

created

updated

Which Database to Use

We don’t have any ACID requirements so we can use the No SQL database. Also, we might have very large data to save as well so No SQL might be better suited. This system will be a write-heavy as well as a read-heavy system. So we can use Cassandra Database here. We can do the capacity estimates of the DB and based on that we can decide on the number of shards we want to have. Each of the shards would be properly replicated as well

There is one more optimization we can do here to improve the latency. We can refill free key ranges in the cache and the KGS service can directly pick from there instead of going to the database every time.

How to resolve concurrency issues

It could very well happen that two requests see the same prefix or range as free. Since there are multiple servers reading from the key DB simultaneously we might have a scenario where two or more servers read the same key as free from the key DB. There are two ways to resolve the concurrency issues we just mentioned

Two or more servers read the same key but only one server will be able to mark that key_prefix as used in the database. Concurrency is at DB level that is each row is locked before being updated and we can leverage that here. Db will return back to the server whether any record was updated or not. If the record was not updated then the server can fetch a new key. If the record was updated then that server has got the right key.

The other option is to use a transaction that does Find and Update in one transaction. Each Find and Update will return a unique key_prefix every time. This is probably not a recommended option because of the load it puts on the database

How to recover key_prefix

Tiny URL service once it has exhausted the range of keys, then it will enter that range into another table from which it can be recovered and put back as free after 2 weeks. We know for sure that after two weeks the keys will be free as we have an expiry of two weeks

What happens if the key ranges are getting exhausted

This will be an unexpected condition. There will be a background worker that will check if the key ranges are getting exhausted. If yes then it can generate ranges for 7 length keys. But how it will know if the key ranges are getting exhausted. For keeping a rough count there could be another table that will store the user count of used keys.

Whenever any range is allotted by the KGS to Tiny URL service it will publish a message that will be picked by a synchronous worker that is going to decrease the count of used keys by 1.

Similarly, whenever a range is free we can increment this counter.

What if the short URL never expires

It is easy to extend the above service to serve URLs that never expire.

Just that our short string will not be limited to 6 length characters. We can use 7 lengths, 8 length characters, or even 9 lengths as the need arises.

There will be no key recovery service

Once a key_range has been allotted we can remove it from the key DB as it is never meant to be freed or recovered

Is not KGS service a single point of failure?

To prevent it we will have proper replication of the key database. Also, there will be multiple app servers for the service itself. We will also have proper autoscaling set up. We can also have Disaster Management

Other common components

Other common components could be

User Service – It holds the user profile information.

Token/Auth Service – Management of User tokens

SMS Service- It is used for sending any kind of message back to the user. For example – OTP

Analytics Service – This could be used to track any kind of analytics

Non-Functional Requirements

Let’s discuss some non-functional requirements now

Scalability

The first thing to consider with the above design is the scalability factor. The scalability of each of the components in the system is very important. Here are scalability challenges you can face and their possible resolutions

Each of the machines in the short_url service and KGS service could only serve a limited number of requests. Hence each service here should have proper autoscaling set in so that based on the number of requests we can add instances up and autoscale them when needed

Your Kafka/SNS system might not be able to take that much load. We can scale horizontally but to a limit. If that is becoming a bottleneck then depending upon the geography or userId we can have two or more such systems. Service discovery could be used to figure out which Kafka system a request needs to go to.

Another important factor of scalability here is that we have designed our system in such a way so that none of the services is bogged with too many things to do. There is a separation of concerns and wherever there was too much of a responsibility on service, we have broken it down

Low latency

We can cache the short URL when it is created with some expiry of course. As and when a short URL is created it is more likely to be accessed in some time. It will reduce latency for many of the read calls.

We also created batches of key or key ranges. This prevents the Short URL service to call the KGS service every time and overall improves the latency.

There is one more optimization we can do here to improve the latency. We can refill free key ranges in the cache and the KGS service can directly pick from there instead of going to the database every time.

Availability

In order for the system to be highly available, it is very important to have redundancy/backups for almost all components in the system. Here are some of the things that need to be done.

In the case of our DB, we need to enable replication. There should be multiple slaves for each of the master shard nodes.

For Redis we also need replication.

For data redundancy, we could be multi-region as well. This could be one of the benefits if one of the regions goes down.

Disaster Recovery could also be set up

Alerting and Monitoring

Alerting and Monitoring is also a very important non-functional requirement. We should monitor each of our services and set up proper alerts as well. Some of the things that could be monitored are

API Response Time

Memory Consumption

CPU Consumption

Disk Space Consumption

Queue Length

….

Moving closer to user location

There are a couple of architectures that could be followed here. One such is Cell Architecture. You can read more about cell architecture here – https://github.com/wso2/reference-architecture/blob/master/reference-architecture-cell-based.md

Avoiding Single Point of Failures

A single point of failure is that part of a system that when stops working then it would lead the entire system to fail. We should try to prevent any single point of failure as well in our design. By redundancy and going multi-region we can prevent such things

Conclusion

This is all about the system design of URL shorter service. Hoped you have liked this article. Please share feedback in the comments