Data Partitioning & Replication | System Design Fundamentals

Our coffee shop chain is thriving. We now have 500 locations, 10 million customers, and process 100,000 orders per day. But the problem is our single database server has become a bottleneck.

Here's what's happening: the database is like a single librarian managing a library with 10 million books. Every customer asks this one librarian for help. She's fast, but there's only one of her. The line is getting longer.

We have two options:

Hire more librarians (replication) with each having a copy of the catalog, so multiple people can answer questions simultaneously
Split the library into sections (partitioning) with one librarian handling Fiction A-M, another handling Fiction N-Z, and so on

Most large systems do both. Let's understand how.

What You Will Learn

Why single databases eventually hit limits
Replication: keeping copies of your data in sync
Partitioning (sharding): splitting data across servers
When to use each strategy
Common patterns and their trade-offs
How companies like Uber and Instagram handle this

Replication: Multiple Copies of Everything

The Concert Ticket Analogy

Imagine you're selling concert tickets. One ticket booth with one seller. Works fine for 100 customers. But for a Taylor Swift concert with 50,000 fans? That single booth becomes a disaster.

Solution: Open 5 identical ticket booths, each with the same information about available seats. Now 5 customers can be served simultaneously.

That's called replication. Multiple database servers, each with a complete copy of your data.

How Replication Works

Primary (Leader): Handles all writes. The source of truth.

Replicas (Followers): Receive copies of data from primary. Handle reads.

Types of Replication

Synchronous Replication

The primary waits for replicas to confirm they've received the data before acknowledging the write.

plaintext
Client: "Save this order"
Primary: "Got it, let me tell the replicas..."
Primary → Replica 1: "Save this" → "Done!"
Primary → Replica 2: "Save this" → "Done!"
Primary → Client: "Order saved!"

Pros: Data is guaranteed to exist on multiple servers. If primary dies, replicas have everything.

Cons: Slower writes. If any replica is slow or down, writes slow down or fail.

Asynchronous Replication

The primary acknowledges immediately and sends to replicas in the background.

plaintext
Client: "Save this order"
Primary: "Order saved!" (immediate response)
Primary → Replicas: "Here's the data" (happens later)

Pros: Fast writes. Primary doesn't wait for replicas.

Cons: Replicas might be slightly behind. If primary crashes before replicating, data can be lost.

The Replication Lag Problem

With async replication, replicas can be behind:

plaintext
Time 0: User updates profile on Primary
Time 1: Primary returns "Success!"
Time 2: User refreshes page, hits Replica (still has old data)
Time 3: User sees OLD profile → "My update didn't save!"
Time 4: Replication catches up
Time 5: User refreshes, sees correct data → Confused

Solutions:

Read-your-writes consistency: Route the user who just wrote back to the primary for their next read
Sticky sessions: Same user always hits the same replica
Wait for replication: Delay response until replicas catch up (slower but consistent)

When to Use Replication

Scenario	Replication Helps?
Read-heavy workload	Yes, spread reads across replicas
High availability needed	Yes, failover to replica if primary dies
Write-heavy workload	No, all writes still go to one server
Data locality (multi-region)	Yes, replica in each region

Partitioning (Sharding): Dividing the Data

Replication gives you more copies. But what if your data is too big for one server? Or writes are overwhelming a single primary?

That's where partitioning comes in.

The Library Analogy

Our library has 10 million books. Even with multiple librarians, the building is full. Solution: open multiple library branches.

Branch A: Books with authors A-M
Branch B: Books with authors N-Z

Now each branch has half the books and half the traffic. You can add more branches as you grow.

How Partitioning Works

Each shard is an independent database holding a portion of the data.

Choosing a Shard Key

The shard key determines which shard holds each piece of data. This is the most important decision in sharding.

Option 1: Range-Based Sharding

Divide by ranges of the shard key.

plaintext
Shard 1: user_id 1 - 1,000,000
Shard 2: user_id 1,000,001 - 2,000,000
Shard 3: user_id 2,000,001 - 3,000,000

Pros: Simple. Range queries are efficient (get users 500-600 → just hit Shard 1).

Cons: Hot spots. New users all hit Shard 3. Celebrity users might all be in Shard 1.

Option 2: Hash-Based Sharding

Hash the shard key and distribute based on the hash.

python
shard = hash(user_id) % number_of_shards

hash("alice") % 3 = 1  → Shard 1
hash("bob") % 3 = 0    → Shard 0
hash("carol") % 3 = 2  → Shard 2

Pros: Even distribution. No hot spots from sequential IDs.

Cons: Range queries are expensive (must query all shards). Adding shards requires rehashing.

Option 3: Directory-Based Sharding

A lookup table tells you which shard has each piece of data.

plaintext
Lookup Table:
user_id 12345 → Shard 2
user_id 12346 → Shard 1
user_id 12347 → Shard 3

Pros: Flexible. Can move data between shards easily.

Cons: Lookup table is a single point of failure. Extra hop for every query.

The Shard Key Selection Problem

Bad shard key choices cause pain:

Too much data on one shard (hot partition):

plaintext
Shard key: country
Result: 80% of users are in USA → Shard "USA" is overwhelmed

Related data on different shards:

plaintext
Shard key: user_id
Query: "Get all orders for user 123"
Problem: Orders sharded by order_id, user sharded by user_id
Result: Must query ALL order shards to find user 123's orders

Good shard key criteria:

High cardinality (many unique values)
Even distribution
Query patterns align (queries usually filter by the shard key)
Doesn't change frequently

Cross-Shard Operations: The Hard Part

With data split across shards, some operations become much harder:

Cross-Shard Queries

sql
-- Easy in single database:
SELECT * FROM users u JOIN orders o ON u.id = o.user_id WHERE o.total > 100;

-- With sharding:
-- Must query all user shards, all order shards, join results in application
-- Slow, expensive, complex

Cross-Shard Transactions

sql
-- Easy in single database:
BEGIN;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

-- With sharding, if accounts 1 and 2 are on different shards:
-- No simple way to make this atomic
-- Need distributed transactions (2PC) or saga patterns

Combining Replication and Partitioning

Production systems typically use both:

Each shard has its own primary and replicas. This gives you:

Horizontal scale: More shards = more capacity
High availability: Replica takes over if primary fails
Read scale: Replicas handle read traffic within each shard

Real-World Patterns

Instagram: Sharding by User ID

Instagram shards data by user ID. All of a user's photos, followers, and likes are on the same shard.

Why it works: Most queries are user-centric. "Show me Alice's photos" hits one shard.

The challenge: "Show me photos from everyone I follow" requires querying multiple shards (followers are on different shards).

The solution: Fan-out on write. When Alice posts, write to her followers' timeline shards.

Uber: Sharding by Geography

Uber shards ride data by geographic region.

Why it works: A ride in New York doesn't need to know about rides in Tokyo. Queries are regional.

The challenge: Users who travel frequently appear on multiple shards.

Slack: Sharding by Workspace

Slack shards by workspace (team). All messages, channels, and users for a workspace are together.

Why it works: 99% of queries are within a workspace. Cross-workspace queries are rare.

When to Shard (And When Not To)

Don't Shard Prematurely

Sharding adds significant complexity:

Application must know about shards
Some queries become impossible or expensive
Operations (backups, migrations) get harder
Debugging across shards is painful

Rule of thumb: A well-tuned PostgreSQL server handles millions of rows and thousands of queries per second. You probably don't need sharding until you've exhausted:

Vertical scaling (bigger server)
Read replicas
Caching
Query optimization

Signs You Actually Need Sharding

Database server at 80%+ CPU consistently
Disk is full (can't fit all data on one machine)
Write throughput hitting limits (can't be solved by replicas)
Compliance requires data in specific regions

The Scaling Progression

plaintext
Stage 1: Single Server
         Everything on one machine
         ↓
Stage 2: Read Replicas
         Primary + replicas for read scaling
         ↓
Stage 3: Caching Layer
         Redis in front to reduce DB load
         ↓
Stage 4: Vertical Scaling
         Bigger server, more RAM, faster disks
         ↓
Stage 5: Sharding
         Split data across multiple primaries

Most applications never reach Stage 5. And that's fine. Simpler is better.

Key Takeaways

Replication creates copies of your data:

Improves read throughput (spread reads across replicas)
Improves availability (failover to replica)
Doesn't help with write scaling or storage limits

Partitioning (Sharding) splits your data:

Each shard holds a portion of data
Improves write throughput and storage capacity
Makes cross-shard queries and transactions hard

Shard key selection is critical:

Choose based on access patterns
Aim for even distribution
Keep related data together when possible

Don't shard prematurely:

Exhaust simpler options first
Sharding adds operational complexity
Many successful systems never need it

What's Next

When you split data across servers, you face a fundamental question: what happens when servers disagree? If a network partition separates your shards, should the system keep running (risking inconsistency) or stop to preserve correctness?

This is the CAP Theorem, and it governs every distributed system decision. Let's explore it in the next lesson.