System Design
  • Introduction
  • Basics
    • Key Characteristics of Distributed Systems
    • Load Balancing
    • Reverse Proxy
    • Cache
    • Sharding or Data Partitioning
    • Index
    • Redundancy and Replication
    • SQL vs NoSQL
  • Advanced
    • The Difference between SLI, SLO, and SLA
    • Consistent Hashing
    • Server-to-client Communication
    • Data Sharding
  • Database
    • SQL
    • ACID
    • Data Partitioning
  • News Feed
    • Design a News Feed System
    • Timeline creation with sharded data
    • Facebook News Feed
    • Twitter News Feed (Timeline)
    • How does facebook rank news feed?
  • Mint
    • Design Mint
  • Web Crawler
    • Design a web crawler
    • Design a decentralized web crawler
  • TODO
    • TODO
    • Elastic Search
    • Lucene
    • twitter-snowflake
Powered by GitBook
On this page
  • Partitioning based on UserID
  • Issues of this approach
  • Partitioning based on PhotoID
  • How to generate photoIDs?
  • Single point of failure
  • Plan for future growth of the system

Was this helpful?

  1. Advanced

Data Sharding

Assume we want to store the users' photos in our database and shard the data.

Partitioning based on UserID

We want to keep all data of a user on the same shard. Assume we use 200 shards, we can find the shardID by userID % 200 . Each shard can have its own auto-increment sequence for photoID, and we prepend shardID to each photoID so that each photo has a unique global photoID.

Issues of this approach

  1. How would we handle hot users? Several people follow such hot users, and any photo they upload is seen by a lot of other people.

  2. Some users will have a lot of photos compared to others, thus making a non-uniform distribution of storage.

  3. What if we cannot store all pictures of a user on one shard? If we distribute photos of a user onto multiple shards, will it cause higher latencies?

  4. Storing all pictures of a user on one shard can cause issues like unavailability of all of the user’s data if that shard is down or higher latency if it is serving high load etc.

Partitioning based on PhotoID

We can generate globally unique photoIDs first then find the shardID using photoID % 200.

How to generate photoIDs?

ONe solution could be that we dedicate a separate database instance to generate auto-incrementing IDs. This is a Key Generation Service (KGS).

Single point of failure

The KGS is a single point of failure.

One solution is that we define two databases with one generating even numbered IDs and the other generating odd numbered. And we put a load balancer in front of both of these databases to round robin between them and to deal with downtime.

Plan for future growth of the system

We can use the above partition methods to do logical partition. At early stage of the system, we can store multiple logical partition on the same server. Later as some of the logical partitions grow big, we can migrate them to a separate server.

This can be done using config files or a separate database that map our logical partitions to database servers.

PreviousServer-to-client CommunicationNextSQL

Last updated 4 years ago

Was this helpful?