Table of Contents
Overview
The URL shortening service is used to shorten a long URL. The short URL will not be dependent upon the length of the actual URL. For
- https://techbyexample/system-design/whatsapp-system-design should be shortened to https://bit.ly/sdfse43
- https://google.com/search/ can be shortened to https://bit.ly/wrewr34
Notice the second part of the URL in bit.ly. It will always be of fixed length irrespective of the length of the input URL. Let’s note down the functional requirement of the system
- You should be able to generate the short URL of fixed length irrespective of the length of the input URL.
- Once the shortened URL is clicked it should be able to redirect the user to the actual URL.
- There should be an expiration time with each of the URLs. After that, the shortened URL will expire
- We are also going to consider the case where a short URL never expires later in this tutorial
Here are some of the non-functional requirements
- The system should be highly available
- The system should be strongly consistent. What it means is that once a short URL is generated for a given long URL then the system should be able to return that long URL given the short URL in the next immediate call.
- The system should be fault-tolerant
- There should be no single point of failure
What a short URL should look like? Before we look at the requirements, let’s look at a sample example URL. Here are some requirements
- It must be short and unique. Preferably 6-8 characters in length
- It should only have characters that are URL safe. Below are URL safe characters
- Lower case Alphabets – “a-z”
- Uppercase Alphabets – “A-Z”
- Digits – “0-9”
- Dot – “.”
- Ampersand – “~”
- Underscore – “_”
- Dash – “-“
Here is one example of a short URL that has 7 characters as part of the short string.
https://bit.ly/sdfse43
API Details
Below will be the API that will be needed in the system
- Create short URL – This API will create a short URL given a long URL
- Delete Short URL – This API will be used to delete a short URL that was created earlier.
- GET URL – This API will redirect to the long URL given a short URL
- List Short URL – See all Short URLs for a particular user
How Short URL will redirect to the Long URL
As soon as our service will receive the short URL it will fetch the corresponding long or original URL for that short URL. There are two cases
- If not present it will respond with 404 Resource Not Found back to the browser
- If present then it will issue an HTTP 302 redirect to the browser. The Location response header will contain the long or original URL. The browser will then redirect to the long URL. This is the standard way of redirection as specified by the HTTP protocol
One of the most important parts of implementing a short URL service is to generate a short URL string. Before looking at the solution let’s look at some of the encoding schemes that can be used to produce a short URL string
Encoding Schemes
There are three solutions to design a unique short character for a given long URL
- MD5
- Counter
- Key Range
MD5 Approach
We will have an MD5 function that will return a unique string for a given long URL. The MD5 string is of 128-bit fixed length. We can take the first 42 bits to represent it as 7 base64 characters. Each character in base64 takes 6 bits that is why 6*7=42 bits overall. We can also take the first 48 bits to represent it as 8 base64 characters. How to prevent collision
As you can see that the biggest issue with the MD5 approach is a collision. Two long URLs can result in an MD5 that has the same first 42 bits. How we can detect collision then. We first do an MD5 of a given long URL then we take the first 42 bits to create the 7 bases 64 characters string. Then we check that string in DB to see if it has been already taken or not. If yes then we take the 2-43 bits and then check
Advantages
- The biggest advantage of the MD5 approach is that it is simple to implement.
- If there is a system in which only a few short URL needs to be generated and where the short URL generation is async then this scheme could work well as there will be fewer collisions and even if a collision happens we still have some time to figure out the next short URL since the overall process is async.
Disadvantages
- As we can see this solution is not scalable. If we large number of short URLs is to be generated within seconds then the system could perform very poorly if there are a lot of collisions.
Counter Approach
As we have already seen there is a way to convert a base10 number to a base64 string. A counter is nothing but a base10 number. So if we have a way of generating a unique number for a given long URL then we will be able to generate a short URL. The question is how we can generate that unique counter or number. The idea is to have a free list of ranges of numbers. For that, we can have another service that can expose an API that will return the free range of numbers that can be returned.
Advantages:
- It is a scalable option that can be used for any service which has huge traffic.
Disadvantages
- Converting a base10 number to a base64 will not yield a fixed-length string. For example, converting 100,000 into base64 will give “Yag”. This might be ok but a professional URL shorter service generator should adopt a standard. Also, this disadvantage can be circumvented by choosing number ranges that give 6 or 7 lengths of Base64 strings.
- There is a cost of conversion of number to base64 every time though this conversion cost might not have a substantial impact.
- Little complex as compared to MD5
Range of Keys
The third option is to have a range of keys itself instead of a range of the counter.
Advantages:
- It is a scalable option that can be used for any service which has huge traffic.
Disadvantages
- Little complex as compared to MD5
Both Counter and Range of Keys are good options. We will be considering the Range of Keys in this tutorial
High-Level Design
On a high level let’s discuss what will be the higher flow and what all services would exist.
- There will be an API gateway on which every request from all the users will land.
- There will be a Short URL service. This service holds the responsibility of handling all the APIs that will come for the user.
- There will be a Key Generation Service that holds the responsibility of generation of short keys
- The Short URL service will call the key generation service whenever it needs new keys.
- When the Short URL service has exhausted all the key ranges it will publish a Kafka/SNS message specifying that the key ranges have been exhausted.
- This message will be picked by a Key Recovery service which will be a worker. It will mark the key range as free so that it can be picked again. This worker will also delete the short_url – long_url mapping for all the keys in the range from the database.
- We will cache the Short URL – Long URL mapping as soon as it is created
Below is the high-level diagram of the service
Let’s see some of the details of the Short URL service and the Key Generation Service
Short URL Service
Let’s see what all databases will be needed for the short_url service. We will need a short_url table. Below will the fields in the short_url table
- short_url_key
- long_url
- user_id
- created
- updated
What database to use
Since we don’t have any ACID requirements. So No SQL database will work for us. Also, we might have very large data to save as well so No SQL might be better suited. This system will be a write-heavy as well as a read-heavy system. Also, there is less secondary index requirement here and almost all queries will be on the primary field which is short_url_key. So we can use Cassandra Database here.
As mentioned in the high-level diagram it is the Short URL that is going to keep track of all short URLs generated. It will coordinate with the Key Generation Service for getting the range of free keys. Each of the keys can generate one short URL. Now let’s look at the design of the Key Generation Service
Key Generation Service
There will be a KGS service that holds the responsibility of generating the keys. First, let’s see what should be the length of each key. Possible options of length are 6,7,8. Only base64 URL-safe characters could be used to generate the key. Therefore
- For 6- We have 64^6= 68.7 billion options
- For 7 – We have 64^7 = ~3500 Billion options
- For 8 – We have 64^8= trillion options
We can now assume that 68.7 billion entries will be enough so we can have 6 characters for the key. Now the question is how these are going to be maintained in the Database. If we are storing 68 billion entries in the database then it might be too many entries and a waste of resources. One option is to store ranges of keys in the databases. We can have a range of 64 where we only store the first five characters which will act as a prefix for all 64 keys which can be generated from this prefix. Let’s say we have below prefix
adcA2
Then below 64 keys can be generated from this
- adcA2[a-z] – 26 keys
- adcA2[A-Z] – 26 keys
- adcA2[0-9] – 10 keys
- adcA2[-_] – 2 keys
We can store these ranges in DB. So for 6 characters, we will have overall 64^5 entries in the database. The keys will be returned by the Key Service to the Tiny URL services in ranges and batches only. The Tiny URL service will then use this prefix to generate 64 keys and serve 64 different create tiny URL requests.
This is optimization as the Short URL service only needs to call the Key Generation Service only when it has exhausted all its 64 keys. So there will be one call from the Short URL service to the Key Generation Service for generating 64 short URLs
Let’s now see the points for the KGS service
- Database Schema
- Which database to use
- How to resolve concurrency issues
- How to recover key_prefix
- What happens if the key ranges are getting exhausted
- What if the short URL never expires
- Is not KGS service a single point of failure?
Database Schema
There will just be a single table that will store the range of keys i.e prefix. Below will be the fields in the table
- key_prefix
- key_length – It will always be 6 for now. These fields exist if we need 7 length keys in any scenario
- used – If this is true then the key prefix is currently in use. If false then it is free to be used
- created
- updated
Which Database to Use
We don’t have any ACID requirements so we can use the No SQL database. Also, we might have very large data to save as well so No SQL might be better suited. This system will be a write-heavy as well as a read-heavy system. So we can use Cassandra Database here. We can do the capacity estimates of the DB and based on that we can decide on the number of shards we want to have. Each of the shards would be properly replicated as well
There is one more optimization we can do here to improve the latency. We can refill free key ranges in the cache and the KGS service can directly pick from there instead of going to the database every time.
How to resolve concurrency issues
It could very well happen that two requests see the same prefix or range as free. Since there are multiple servers reading from the key DB simultaneously we might have a scenario where two or more servers read the same key as free from the key DB. There are two ways to resolve the concurrency issues we just mentioned
- Two or more servers read the same key but only one server will be able to mark that key_prefix as used in the database. Concurrency is at DB level that is each row is locked before being updated and we can leverage that here. Db will return back to the server whether any record was updated or not. If the record was not updated then the server can fetch a new key. If the record was updated then that server has got the right key.
- The other option is to use a transaction that does Find and Update in one transaction. Each Find and Update will return a unique key_prefix every time. This is probably not a recommended option because of the load it puts on the database
How to recover key_prefix
Tiny URL service once it has exhausted the range of keys, then it will enter that range into another table from which it can be recovered and put back as free after 2 weeks. We know for sure that after two weeks the keys will be free as we have an expiry of two weeks
What happens if the key ranges are getting exhausted
This will be an unexpected condition. There will be a background worker that will check if the key ranges are getting exhausted. If yes then it can generate ranges for 7 length keys. But how it will know if the key ranges are getting exhausted. For keeping a rough count there could be another table that will store the user count of used keys.
- Whenever any range is allotted by the KGS to Tiny URL service it will publish a message that will be picked by a synchronous worker that is going to decrease the count of used keys by 1.
- Similarly, whenever a range is free we can increment this counter.
What if the short URL never expires
It is easy to extend the above service to serve URLs that never expire.
- Just that our short string will not be limited to 6 length characters. We can use 7 lengths, 8 length characters, or even 9 lengths as the need arises.
- There will be no key recovery service
- Once a key_range has been allotted we can remove it from the key DB as it is never meant to be freed or recovered
Is not KGS service a single point of failure?
To prevent it we will have proper replication of the key database. Also, there will be multiple app servers for the service itself. We will also have proper autoscaling set up. We can also have Disaster Management
Other common components
Other common components could be
- User Service – It holds the user profile information.
- Token/Auth Service – Management of User tokens
- SMS Service- It is used for sending any kind of message back to the user. For example – OTP
- Analytics Service – This could be used to track any kind of analytics
Non-Functional Requirements
Let’s discuss some non-functional requirements now
Scalability
The first thing to consider with the above design is the scalability factor. The scalability of each of the components in the system is very important. Here are scalability challenges you can face and their possible resolutions
- Each of the machines in the short_url service and KGS service could only serve a limited number of requests. Hence each service here should have proper autoscaling set in so that based on the number of requests we can add instances up and autoscale them when needed
- Your Kafka/SNS system might not be able to take that much load. We can scale horizontally but to a limit. If that is becoming a bottleneck then depending upon the geography or userId we can have two or more such systems. Service discovery could be used to figure out which Kafka system a request needs to go to.
- Another important factor of scalability here is that we have designed our system in such a way so that none of the services is bogged with too many things to do. There is a separation of concerns and wherever there was too much of a responsibility on service, we have broken it down
Low latency
- We can cache the short URL when it is created with some expiry of course. As and when a short URL is created it is more likely to be accessed in some time. It will reduce latency for many of the read calls.
- We also created batches of key or key ranges. This prevents the Short URL service to call the KGS service every time and overall improves the latency.
- There is one more optimization we can do here to improve the latency. We can refill free key ranges in the cache and the KGS service can directly pick from there instead of going to the database every time.
Availability
In order for the system to be highly available, it is very important to have redundancy/backups for almost all components in the system. Here are some of the things that need to be done.
- In the case of our DB, we need to enable replication. There should be multiple slaves for each of the master shard nodes.
- For Redis we also need replication.
- For data redundancy, we could be multi-region as well. This could be one of the benefits if one of the regions goes down.
- Disaster Recovery could also be set up
Alerting and Monitoring
Alerting and Monitoring is also a very important non-functional requirement. We should monitor each of our services and set up proper alerts as well. Some of the things that could be monitored are
- API Response Time
- Memory Consumption
- CPU Consumption
- Disk Space Consumption
- Queue Length
- ….
Moving closer to user location
There are a couple of architectures that could be followed here. One such is Cell Architecture. You can read more about cell architecture here – https://github.com/wso2/reference-architecture/blob/master/reference-architecture-cell-based.md
Avoiding Single Point of Failures
A single point of failure is that part of a system that when stops working then it would lead the entire system to fail. We should try to prevent any single point of failure as well in our design. By redundancy and going multi-region we can prevent such things
Conclusion
This is all about the system design of URL shorter service. Hoped you have liked this article. Please share feedback in the comments