Nearby Friends System Design - Welcome to Tech by Example

Table of Contents

Overview

The objective is to design the nearby friends feature. When we say nearby then it means that the location of the friends is constantly changing. As and when the location changes, then their location is constantly getting updated and shared with their nearby friends.

Let’s first list down the functional requirements of the system

Functional Requirements

A user when online should be able to view the current location of all their friends and vice versa

As and when the friend’s location gets changed it should get reflected in the view of the user and vice versa

We should be able to store the location history of the user as and when it changes.

A user should be able to control whether they want to appear as a nearby friend or not. In other words, they have the capability to turn off the feature for them.

Non-Functional Requirements

The latency of the system should be less.

The location of friends shown to the user will be eventually consistent.

Also with respect to sharing location updates, the system should be eventually consistent. Meaning that the location of the user will be communicated to his friends in an eventually consistent way in the sense that immediately the user’s location will not be correctly shown to his friends but eventually the correct location will be shown.

Now the first question that comes to mind while designing nearby friends is how the location update of friends will be updated back in the client app. For that, we have a couple of options

HTTP long polling – in this case, the client keeps polling for the location of friends nearby. This will be inefficient because the polling will return an empty result quite a few times. In other words, there could be a lot of empty receives which will waste network bandwidth. Also in the case of HTTP polling, only the client can poll the server. There is no way for the server to push updates back to the client.

Web Sockets – The other option is web sockets. In this case, the client app will always be connected to the server and the communication is two-way. The client can communicate with the server as well as the server can push updates to the clients. This approach will be efficient in the sense that the server will only communicate back to the client only when there are some location updates from any of the friends of the user.

Overall web sockets are best for our scenario. Another advantage of web sockets is that they have a sticky session where if a particular user has to open its connection to a particular server and it connects to one of the instances at the server end, then it will always be connected to that instance. All the requests of that user will go to that particular instance only. Hence this is what makes web sockets a good option for peer-to-peer communication.

We are going to discuss the below things in the tutorial

Data Storage

Location Update Requirements

High-Level Design

API Design

How location sharing is going to work

First Approach – Using Distributed Redis

Second Approach – Using Redis Pub/Sub

Non-Functional Requirement

Scalability of Redis Pub/Sub Server

Conclusion

Data Storage

Let’s now analyze what data we need to be storing.

User Profile: We need to store the user profile information, their name, profile picture, etc. Also, we need to save their preference related to enabling or disabling nearby friends.

Friend List: The system should be able to store the friend list of all users. This friend list will be used to eventually show nearby friends.

Location tracking and storage: The system should be able to track the location info for the user. Also, they should be able to store the location history as well.

The nearby friends are only shown for online users. Hence the system should have a way of storing the online status as well.

Location Updates Requirements

With respect to the working of nearby friends’ location updates, there are a few below requirements

When the user comes online first he should be able to see all friends who
- Are online
- And whose current location is within x miles of the current location of the user

While the user is online, if any of his online friend’s new location is within of x miles from the current location of the user then it will start showing in the nearby friends and vice versa

When the user goes offline then he should be shown offline to his friends as well and his location will not be shared.

So there are three scenarios that we need to take care of in our design

When the user first comes online

While the user is being online

When the user goes offline

High-Level Design

On a high level let’s discuss what will be the higher flow and what all services would exist.

There will be an API gateway on which every request from all the users will land.

There will be a Session Service. Session Service will contain a group of instances to which users will be connected by a web socket. Since there is a limit on the number of web sockets that you can open per machine, depending upon the load. So based on the number of users, we can have that number of machines

This Session Service will be connected to a Distributed Redis cluster which will contain information on which user is connected to which box. This information is transient till the user is connected and hence we can use a Distributed Redis for that. It will be the responsibility of the session service to maintain the User-id and machine-id mapping. This service will
- Will insert into if any user is connected to any machine
- When the user gets disconnected

Other than this the session will be a dumb service in the sense that it will just accept the connection and will forward any requests to it on an SNS/Kafka Topic.

The Session Service will publish a message to an SNS topic or Kafka on receiving any user activity.

There will be a Location History service that is going to save the location history of the user. There will be a separate table to store location history. Only the Location History service is going to connect to this table.

There will be a User Last Seen service that is going to maintain the last seen time and last known location of the user.

There will be a Friends service that is going to maintain the friend list of all users. There will be a separate table to store the friend list. Only the Friends service is going to connect to this table.

There will be a User Profile Service that is going to maintain the user profile. Again it will have its own set of tables to manage user profiles

There will be a Near By Friends service that will be responsible for the management of nearby friends. It is going to handle cases when the user is online, the user went offline, etc. We will be discussing this service in detail in this tutorial

There will be a Message Outbound Service which will be a worker whose work will be to send the outbound messages back to the user. This service will not have any kind of business logic at all. It will only accept a message that contains the details of what message to send, to whom it needs to send, and to which machine the user is connected. This is a very dump service that only which doesn’t have any business logic at all.

There will be a TokenService that is going to manage different types of tokens for the user.

API Design

Below will be the list of APIs needed

Location Update API

Set Location Preference on or off

Location Update API

The location update API will be called by the clients whenever the location of the user changes.

Request Body:

longitude
latitude
timestamp

Set Location Preference on or off

This will be called by the client’s app when the user set their location preference on or off

Request Body:

user_id
on/off
timestamp

How location sharing is going to work

With respect to location sharing, there are two ways

Location updates are pulled by the client app of the user at regular intervals.

Location updates of friends being pushed to the client app of the user.

So basically it is Pull vs Push. As we have seen earlier the disadvantage of the Pull model is that it might be a waste of network resources as many times it could be an empty receive.

The second approach of Push is better in two senses

Updates will only be pushed when there is an actual location change of a friend.

The Push approach will reflect the correct state in a more timely manner. As the Pull approach could only be done at regular intervals and regular intervals will be in seconds

We are going to discuss two ways in which we can implement Push of location updates

In the first approach, we are going to design in a very basic way. We will use distributed cache in this approach.

In the second approach, we are going to use redis pub/sub and we will see how we can scale that

First Approach – Using Distributed Redis

First, let’s discuss some of the services in detail that we are going to need in order to fulfill the requirement

Location History Service

There will Location History service that is going to connect to location_history_db. This DB is going to store the location history of the user. Below will be the fields in this DB. We can use any No SQL database for the same such as MongoDB

id
user_id
location
created
updated

User Last Seen Service

This service is going to maintain two things

last_seen time of the user

last location of the user if applicable.

It is going to store both things in the user_last_seen_db. Below are the fields in the user_last_seen_db. We can use any No SQL database in this case as well such as MongoDB

id
user_id
last_seen – It is the timestamp of when the user was last seen
location – this will only be populated if the user has location sharing preference ON.
created
updated

There will be a new service that will be responsible for managing the nearby friends. Let’s name this service – nearby_friends service.

To maintain nearby friends we will use a distributed cache. This distributed cache could be a redis cache as well. This cache will maintain a subscriber list for all the users who have there location preferences on.

For example assume there are two users A, B, and C. Below is the friend list

A and B
A and C
A and D

Below keys will exist in redis where

The key will be the user user_id of the user

The value will be a Set. You can read about Set in redis here – https://redis.com/ebook/part-1-getting-started/chapter-1-getting-to-know-redis/1-2-what-redis-data-structures-look-like/1-2-3-sets-in-redis/

Example:

Subscriber List for user A

A : {
    B: 1,
    C: 1,
    D: 1
}

Subscriber List for user B

B : {
    A: 1
}

Subscriber List for user C

C : {
    A: 1
}

Subscriber List for user D

D : {
    A: 1
}

Let’s consider three scenarios in which all the above data will be maintained and how the overall nearby friends’ feature is going to work.

The user first comes online

When the user is already online

The user goes offline

Let’s see these three cases in detail. But before looking into more details. Let’s first assume that

user A is connected to node 1
user B is connected to node 2
user C is connected to node 3
user D is connected to node 4

The user first comes online

We are going to describe this scenario from user A’s perspective. Below is the diagram for the case when the user first comes online. Assume user A comes online and gets connected to node 1 of the session service.

Near By Friends HLD — Near By Friends High Level Design

Then Node 1 is going to publish the message to Kafka/SNS+ SQS system with the current location and the user_id. This message will be picked by three different services

location_history service

user_last_seen service

nearby_friends service

location_history service is going to update the current location of user A in the location_history_database

user_last_seen service is going to update the last_seen time of the user in the user_last_seen_db.

nearby_friends service is going to perform the below actions

It is going to fetch all the friends of user A who have location-sharing preferences enabled. Assume user A has three friends B, C, and D who have location sharing enabled.

It is going to fan out the message with a different message_type for each of the friends B, C, and D.

Let’s follow the flow for message published for user B

The message for user B will be picked nearby_friends service again. The nearby_friends service is going to first check if user B is online from the user_last_seen service. If B is online the user_last_seen service is also going to return the current location of the user.

Then the nearby friend’s service is going to compute the distance of user B from user A. If the distance is less than the threshold then it is going to add B as a subscriber to A. It will create a redis entry as below. Basically, it will add user B, as a subscriber to A.

A : {
    B: 1
}

Also, it will add user A as a subscriber to user B. Below redis entry will also be created

B : {
    A: 1
}

Then it is going to publish two messages to the Kafka/SNS +SQS system one for informing user A and one for user B
- This first message will be picked by the message_outbound service, which is going to determine which node user B is connected to which is node 2. The message outbound service is going to call node 2. Node 2 is going to inform user B’s client app that user A has come online. The client app will start showing user A to user B
- The second will also be picked by the message_outbound service, which is going to determine which node user A is connected to which is node 1. The message outbound service is going to call node 1. Node 1 is going to inform user A’s client app that B is within the threshold distance. The client app will start showing user B to user A.

The same thing will happen for users C and D. Assume the distance for user C was within x km while for user D it was not. After the message is processed for message C, the below redis entries would have been created.

Subscriber List for user A

A : {
    B: 1,
    C: 1
}

Subscriber List for user B

B : {
    A: 1
}

Subscriber List for user C

C : {
    A: 1
}

When the user is already online

User A is already online but user A’s location has changed. The call will come to node 1 with a new location Then Node 1 is going to publish the message to Kafka/SNS+ SQS system with the current location and of the user. This message again will be picked by three different services

location_history service
user_last_seen service
nearby_friends service

location_history and user_last_seen services are going to update the location history and last_seen time of the user respectively.

nearby_friends service is going to

Fetch the list of subscribers for A which is B and C in this case. Then it is going to fan out the message with a different message_type for B and C. Note that it is only going to fan out the message again for users B and C

This message will be picked by nearby_friends service again and it is going to recompute the distance if the distance is within the threshold it is going to inform users B and C about the new location of user A via the same approach as discussed when the user first came online using message_outbound worker

If the location is not within the threshold then it is going to remove that user from the list of subscribers and vice versa. For example, let’s assume A’s and B’s location is not within x km. Then the subscription of A from user B and the subscription of user B from user A will be removed.

Both user A and user B will be informed via message_outbound worker that the other friend is not within range.

But what about the case where let’s say friend D was already online and with the new location change of user A, friend D is within a range of x km? So friend D should start showing to friend A and vice versa.

To handle such cases, at some regular intervals of let’s say 5 min or maybe more, every location update from user A we are going to re-fetch the entire friends again and recompute the distance to check whether an existing online friend came within range. It is going to repeat the same steps that we mentioned when the user first came online.

Let’s say this time user D has a distance less than x km, with user A. Then below redis entries will be updated and user A and user D will be informed of each other.

Subscriber List for user A

A : {
    B: 1,
    C: 1,
    D: 1
}

Subscriber List for user B

B : {
    A: 1
}

Subscriber List for user C

C : {
    A: 1
}

Subscriber List for user D

D : {
    A: 1
}

We can further optimize the above scenario by maintaining a subscription list of friends that are within (x + 10) km to prevent frequent removal and insertions to the subscribers’ list.

Of course, the subscriber list is maintained for a range of (x + 10) km but the notification is only sent out if any of the friends is within the threshold of x km

When the user goes offline or turns off the online status

When the user goes offline or when the user turns off the online status than his online status should stop showing to all his friends. Let’s discuss both of these cases one by one

When the user goes offline

The below diagram depicts the flow when the user goes offline

Assume that user A goes offline. User A friend’s B, C, and D were online when user A goes offline.

Below is the sequence of steps that are going to happen

User A is connected to node 1 of the session service. As soon as the web socket connection is terminated, the node or machine on which the connection got ended will publish a message to the Kafka/SNS +SQS system

The nearby_friends service is going to pick up the message

First, this service is going to fetch all the current subscribers of user A from the redis distributed cache. The current subscribers returned will be B, C, and D

Then it is going to fan out three messages for users B, C, and D. These messages will be published to Kafka/SNS + SQS system.

There will be different messages and each of the messages and all the messages will be picked by the nearby_friends service again. It is going to remove the subscription of A from users B, C, and D in their respective fan-out messages. Then it is going to publish to the message_outbound service

The message_outbound service is going to forward the message to users B, C, and D informing them that user A went offline.

When the user turns off the online status

It is the same as when the user goes offline

Second Approach – Using Redis Pub/Sub

Now we are going to discuss the Redis Pub/Sub approach in detail. First of all, what is Redis Pub/Sub? Redis Pub/Sub allows

Creation of named channels

A publisher can publish to this named channel

Any number of subscribers can listen to this from the channel. Any message published by the publisher will be available to all the subscribers

You can read more about Redis Pub/Sub here – https://redis.io/docs/manual/pubsub/

So in this approach what we are going to do is maintain a separate named channel for each user. All of the online friends of the user who are within the threshold distance are going to subscribe to this named channel.

Also for simplicity let’s assume that the named channel for a particular user will be created using the user_id of that user.

Let’s consider all three scenarios that we discussed earlier again

When the user first comes online

Assume user A comes online and gets connected to node 1 of the session service.

Node 1 is first going to create a named channel for user A if it doesn’t already exist in the redis distributed cache. Then Node 1 is going to publish the message to Kafka/SNS+ SQS system with the current location of the user. Node 1 is also going to cache the latest location of A as well. This message will be picked by three different services

location_history service

user_last_seen service

nearby_friends service

location_history service is going to update the current location of user A in the location_history_database

user_last_seen service is going to update the last_seen time of the user in the last_seen database.

nearby_friends service is going to perform the below actions

It is going to fetch all the friends of user A from the friends service who have location-sharing preferences enabled. Assume user A has three friends B, C, and D who have location sharing enabled.

For each of the friends, it is going to fan out the message with a different message_type. So three different messages will be published each for users B, C, and D.

Let’s follow the flow for message published for user B

The message for user B will be picked nearby_friends service again. The nearby_friends service is going to first check if user B is online from the user_last_seen service. If online the user_last_seen service is also going to return the current location of the user.

Then the nearby_friends service is going to compute the distance of itself from user A. If the distance is less than the threshold then it is going to publish two messages to the Kafka/SNS +SQS system
- This first message will be picked by the message_outbound service, which is going to determine which node user B is connected to which is node 2. The message outbound service is going to call node 2. Node 2 is going to subscribe B to user A‘s named channel and start listening to it. Also, node 2 is going to inform user B‘s client app that A has come online. The client app will start showing user A to user B. Also
- The second will also be picked by the message_outbound service as well, which is going to determine which node user A is connected to which is node 1. The message outbound service is going to call node 1. Node 1 is going to inform user A’s client app that B is within the threshold distance. Also, Node 1 is going to subscribe user A to user user B’s named channel and start listening to it. The client app will start showing user B to user A

The same thing will happen for user C and user D but assume that user D is not within the threshold of x km from user A so user D will not be shown to user A

So overall

Node 1 to which user A is connected has subscribed to the named channel of user B and user C

Node 2 to which user B is connected has subscribed to the named channel of user A

Node 3 to which user C is connected has subscribed to the named channel of user A

When the user is already online

User A’s location changes. The call is going to come to node 1. It is going to publish the new location to the named channel of user A.

The new location will be received by node 2 and node 3 for users B and C respectively. Both are going to recompute the distance and-

If the distance is less than x km then it is going to inform B and C of the new location of user A

If the distance is more than x km then first it is going to remove the subscription of B and C from user A’s named channel. Also, it is going to inform both users B and C that user A is not within the range

Node 1 is also going to publish the new location to the SNS/Kafka + SQS system and that message will be heard by

location_history service

last_seen service

nearby_friends – It is going to fetch all friends and recompute distance to cater to the case where two users were already online and their location is now within the range of x km.

location_history and user_last_seen services are going to do the same thing as they did earlier.

When the user goes offline or turns off the online status

When the user goes offline or when the user turns off the online status than his online status should stop showing to all his friends. Let’s discuss both of these cases one by one

When the user goes offline

Assume that user A goes offline. User A friend’s B and C were online when user A goes offline. Below is the sequence of steps that are going to happen

The call is going to come to node 1 to which User A is connected. As soon as the web socket connection is terminated on node 1, the node or machine on which the connection got ended will send a message on the named channel of user A. The message will be as below indicating that the user went offline

location:nil

Node 2 and Node 3 having users B and C respectively will listen to this message and immediately inform B and C that user A went offline. The client App of B and C will stop showing user A right away

Apart from that they are also going to remove B and C as subscribers from user A’s named channel.

Optimization: Removing subscriptions of B and C can also be done async via nearby_friends service in a sync manner.

When the user turns off the online status

It is the same as when the user goes offline

Comparison of the two approaches

The second approach is definitely a bit faster as all the nodes or machines in the session service are listening directly to the pub/sub channel in redis and soon as the message is published there the subscribers are informed immediately.

But there is a downside as well. There could be millions or billions of named channels if there are that number of users. On top of that for such a number of channels, there could n times subscriptions where n is the average number of friends online at a time.

All these subscriptions need to be maintained on the nodes of the session service which could be difficult to scale. Later on, we will do a capacity estimate as well as talk about the scaling challenges of the pub/sub model

Non-Functional Requirement

Scalability

The first thing to consider with the above design is the scalability factor. The scalability of each of the components in the system is very important. Here are scalability challenges you can face and their possible resolutions

Each of the machines in the session service could hold only a limited number of connections. Hence based on the number of users online at a moment, the number of machines and a number of instances could be set up. For eg one machine has around 65000 ports

Also for the second approach, each of the machines in the session service could also have millions of subscriptions to the redis pub/sub. For example, if a machine has a total of 1000 users connected via a web socket and each user has 200 online friends. Then that machine will have 1000*200 = 200K subscriptions. These subscriptions will have a memory and CPU impact. This impact has to be taken into consideration while scaling session service.

For the scalability of distributed cache, we should have horizontal scaling enabled. We will discuss the scaling of distributed cache for redis pub/sub in detail later in this tutorial

Your Kafka system might not be able to take that much load. We can scale horizontally but to a limit. If that is becoming a bottleneck then depending upon the geography or userId we can have two or more such systems. Service discovery could be used to figure out which Kafka system a request needs to go to.

A similar approach can be taken for other services as well.

Another important factor of scalability here is that we have designed our system in such a way that none of the services is bogged with too many things to do. There is a separation of concerns and wherever there was too much of a responsibility for service, we have broken it down

Scalability of Redis Pub/Sub Server

As we mentioned earlier, redis pub/sub channels are very cheap to create. Redis Pub/Sub can handle a large number of channels and therefore a large number of publishers and subscribers.

What are some of the scalability factors for Redis Pub/Sub?

Number of Channels

Type of instance used in the Redis Pub/Sub Cluster
- Memory
- CPU

Number of Instances in the cluster or horizontal scalability of the cluster.

The average size of the message that is published to the channel

Replication of the cluster

Number of Channels

Let’s do some math based on the user estimate. Assume there are 1 billion users. 20% of the users are always online and have their location preferences enabled. So that is 200 million users. Assume on average 20 friends of each user are online at the same time. So overall we will need

200 million channels to be created

Each channel will have 20 subscribers

Type of instance used in the Redis Pub/Sub Cluster

The hardware on which redis pub/sub is running is an important factory. Basically the memory and CPU of the machine. The more memory and the CPU of the machine, the more the channel can be created.

Memory

Each UUID of a user is 16 bytes. Each channel will have 200 subscribers. Redis Pub/Sub channel is implemented using a hash map and a double-linked list internally.

The overall size will be around

16 bytes UUID * 1 publisher = 16 bytes
16 bytes UUID * 200 subscriber = 3200 bytes
8 bytes Pointer * 21 = 180 bytes

So 1 channel will take around 3400 bytes on average. We can assume that each channel takes 4000 bytes or 4KB on average

200 million channel * 4 KB= 800 GB

If a server has 20 GB of memory around on average. We will need approx 800/20=40 servers minimum. Also, it has to be added

CPU

The CPU requirement of the cluster is something that has to be calculated using load testing. Based upon the load test either the number of servers has to be increased or decreased

Number of Instances in the cluster or horizontal scalability of the cluster

The cluster has to be horizontally scalable meaning that new servers can be added if required or removed if not required. We can use consistent hashing for that. With consistent hashing, the distribution of keys is minimized in case of the addition of new servers or the deletion of the new servers.

The average size of the message that is published to the channel

This is also an important factor to consider for scalability because when a message is published to a channel, the redis pub/sub is going to hold it until it is received by all the subscribers

Replication of the cluster

The cluster will have replication enabled as well. Each of the instances in the cluster will have replicas as well.

The above scalability considerations apply to the distributed cache as well that we are going to use in Approach 1

Conclusion

This is all about designing nearby friends. Hope you have liked this article. Please share feedback in the comments.

Note: Check out our system design tutorial series System Design Questions