Database Fundamentals | System Design Fundamentals

Our coffee shop has been running smoothly. Customers find us (DNS), traffic is distributed (load balancing), and popular drinks are pre-made (caching). But there's one thing we haven't talked about: where do we store the actual business data?

There's quite a lot of data to store like Customer names, Order history and the Inventory itself. This data shouldn't just disappear when you restart your server.

This is where database comes into picture. And honestly, picking the wrong database is one of the easiest ways to screw up your architecture. I've seen it happen more times than I'd like to admit.

Get it right early, and you're fine. Get it wrong, and you're looking at a painful migration 18 months later when you're already at scale. Ask me how I know.

This lesson isn't about SQL syntax. It's about understanding what databases are actually good at, where they fall apart, and how to pick one without overthinking it.

What You Will Learn

Why relational databases (MySQL, PostgreSQL) are still the default choice
When NoSQL databases make sense (and when they don't)
How indexes work and why they matter
ACID transactions and why they're important
Database scaling patterns: replicas, sharding, and when to use each
How to choose the right database for your use case

Relational Databases: The Default Choice

Relational databases are like spreadsheets on steroids. Tables with rows and columns. Everything has a place. Everything follows rules.

MySQL and PostgreSQL have been around forever. They're boring but that's actually a good thing.

sql
CREATE TABLE users (
    id          SERIAL PRIMARY KEY,
    email       VARCHAR(255) UNIQUE NOT NULL,
    name        VARCHAR(100)
);

CREATE TABLE orders (
    id          SERIAL PRIMARY KEY,
    user_id     INTEGER REFERENCES users(id),
    total       DECIMAL(10, 2)
);

Why They Win

Data integrity. The database won't let you insert garbage:

sql
INSERT INTO orders (user_id) VALUES (99999);
-- ERROR: foreign key violation (user doesn't exist)

This saved me from so many bugs when I was starting out.

JOINs just work:

sql
SELECT users.name, orders.total
FROM users
JOIN orders ON orders.user_id = users.id;

Transactions. Either everything happens or nothing happens:

sql
BEGIN;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

If the second UPDATE fails, the first one gets rolled back. Money doesn't vanish.

You can ask questions you didn't plan for. Need a report on users who ordered in the last 30 days? Just write a query. With NoSQL, you better hope you designed for that query upfront.

Where They Struggle

Scaling horizontally sucks. These databases were built for one big server. If we want to split data across multiple servers then Good luck! JOINs will break. Transactions will break. Everything gets complicated.

Schema changes are scary. Alter commands take time.

Rigid schemas. If your data structure keeps changing, you're fighting the database.

NoSQL: When Relational Doesn't Fit

NoSQL isn't better or worse than relational. It's different as it trades some guarantees for other benefits.

Here's the thing: relational databases are great when your data is structured and you need to query it in flexible ways. But what if you're storing user activity logs? Or sensor data from IoT devices? Or a product catalog where every product has wildly different attributes?

That's where NoSQL makes sense. Sometimes.

Document Databases (MongoDB, DynamoDB)

Instead of rigid tables, you store JSON documents. Each document can have different fields.

json
{
  "_id": "user_123",
  "email": "alice@example.com",
  "preferences": {
    "theme": "dark",
    "notifications": true
  },
  "orders": [
    {"id": "order_1", "total": 99.99, "items": 3},
    {"id": "order_2", "total": 49.99, "items": 1}
  ]
}

Use it for:

Content management systems where every article has different fields
Product catalogs (a shirt has size/color, a laptop has CPU/RAM/storage)
User profiles that evolve over time

Don't use it for:

Anything that needs JOINs across collections
Financial transactions
Reports that aggregate across multiple document types

MongoDB is popular because it's easy to get started.

Key-Value Stores (Redis, DynamoDB)

It's literally just key-value pairs. Like a hash map that's persistent and distributed.

plaintext
SET user:123:session "abc123xyz"
GET user:123:session → "abc123xyz"

SET user:123:cart '{"items": [1, 2, 3]}'
GET user:123:cart → '{"items": [1, 2, 3]}'

Use it for:

Session storage
Caching (we just covered this in previous lesson)
Leaderboards
Rate limiting

Don't use it for:

Anything where you need to search by value
Relationships

Redis is the most common one. It's really fast with sub-millisecond latency.

Wide-Column (Cassandra, ScyllaDB)

Wide-column stores like Cassandra are built for massive write throughput. We're talking hundreds of thousands of writes per second.

Good for:

Time-series data (metrics, logs, IoT sensor data)
Event logging at massive scale
Write-heavy workloads (100K+ writes/second)

Bad for:

Complex queries and aggregations
JOINs
Transactions across rows

When to use it: Time-series data at scale, Logs, Metrics and IoT sensor data. Anything where you're writing constantly and querying by time ranges.

I worked on a system that ingested around 200K events per second. Cassandra handled it seamlessly.

Graph Databases (Neo4j)

The analogy: A web of sticky notes connected by strings. Great for following connections.

plaintext
(Alice)-[:FRIEND]->(Bob)-[:WORKS_AT]->(Google)
(Alice)-[:FRIEND]->(Carol)-[:WORKS_AT]->(Google)

Query: "Find friends of Alice who work at Google"

Good for:

Social networks (friends of friends)
Recommendation engines
Fraud detection (finding suspicious patterns)

Bad for:

Simple CRUD operations
Large-scale data that doesn't have relationships
Horizontal scaling

When to use it: Social networks. Recommendation engines. Fraud detection.

If you're constantly traversing relationships ("friends of friends who like X"), graph databases are way faster than doing recursive JOINs in SQL.

But honestly? Most apps don't need this.

Quick Comparison

Type	Best For	Don't Use For
Relational (MySQL/Postgres)	Most applications	Massive write throughput
Document (MongoDB)	Flexible schemas	Complex queries across collections
Key-Value (Redis)	Cache, sessions	Searching by value
Wide-Column (Cassandra)	Time-series at scale	Transactions
Graph (Neo4j)	Social networks	Simple CRUD apps

Indexing: The Library Card Catalog

Imagine a library with 1 million books, but no catalog system. To find "Database Design Patterns," you'd have to walk through every shelf, checking every book. That's a full table scan and it's painfully slow.

Now imagine the library has a card catalog organized alphabetically by title. You look up "Database," find the card, and it tells you "Shelf 47, Position 12." You walk directly there. That's an index.

How Database Indexes Work

Without an index, the database checks every row:

plaintext
Query: Find user with email 'alice@example.com'
Check row 1... no
Check row 2... no
Check row 3... no
... (1 million rows later)
Check row 847,293... yes! Found it.

With an index on the email column:

plaintext
Query: Find user with email 'alice@example.com'
Look up 'alice@example.com' in index → Row 847,293
Jump directly to row 847,293 → Found it!

The index is a sorted data structure (usually a B-tree) that maps values to row locations. Lookups go from O(n) to O(log n).

Types of Indexes

Primary key index (automatic): Every table should have a primary key. The database automatically creates an index on it.

Secondary index (you create these):

sql
CREATE INDEX idx_users_email ON users(email);

Composite index (multiple columns):

sql
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- This uses the index:
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';

-- This also uses the index:
SELECT * FROM orders WHERE user_id = 123;

-- This CANNOT use the index:
SELECT * FROM orders WHERE status = 'pending';

Composite indexes work left-to-right. An index on (user_id, status) helps queries on user_id or user_id + status. But NOT queries on just status.

I've debugged this exact issue multiple times. The query is slow, you check, and there's an index on the column but it's the second column in a composite index.

The Index Trade-off

Indexes aren't free:

They take disk space (the index itself is data)
They slow down writes (every INSERT/UPDATE has to update the indexes too)

Index columns you actually query. If you have WHERE email = ... in your queries, index email.

Don't index everything. I've seen tables with 15 indexes. Half of them were never used. Just slowing down writes for no reason. Don't index columns you never query.

Always Use EXPLAIN

Use EXPLAIN to check if your index is actually being used:

sql
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';

If you see "Seq Scan" on a big table, you need an index. If you see "Index Scan," you're good.

Transactions and ACID

ACID is an acronym but honestly I just remember:

Atomicity: All or nothing. No half-done transactions.
Consistency: No invalid states.
Isolation: Transactions don't step on each other.
Durability: Once committed, it's permanent. Even if the server crashes.

Isolation Levels

Level	What You Get	Performance
Read Uncommitted	Can see uncommitted changes	Fastest
Read Committed	Only see committed changes	Good balance
Repeatable Read	Same query = same results	Slower
Serializable	Complete isolation	Slowest

Most databases default to "Read Committed" and that's fine for 99% of use cases.

The only time I've changed isolation levels was for a financial system that needed Serializable. Everything else? Default is fine.

Distributed Transactions

Distributed transactions (across multiple databases) are a pain. Most systems just accept eventual consistency.

2PC (Two-Phase Commit): Coordinator ensures all commit or all abort. Slow, rarely used at scale.

Saga pattern: Local transactions with compensating actions. If step 2 fails, undo step 1.

Eventual consistency: Accept temporary inconsistency, resolve later.

Scaling Patterns

Vertical Scaling (Bigger Server)

Just get a bigger server. More RAM, faster CPU, SSDs.

A good PostgreSQL server can handle 10-15K queries per second. That's enough for most apps. Don't overthink scaling.

Read Replicas

One primary database for writes. Multiple read replicas for reads.

This is the most common scaling pattern. Works great for read-heavy workloads (which is most web apps).

Catch: Replication lag. Replicas are usually a few milliseconds behind. Sometimes seconds if there's a spike. So if a user updates their profile and immediately refreshes, they might see old data.

You can handle this by routing reads for recently-written data to the primary. Or just accept it. Most users don't notice a 100ms delay.

Sharding (Last Resort)

Sharding means splitting your data across multiple databases.

Example: Users 1-1M go to Database 1. Users 1M-2M go to Database 2.

Why it sucks:

Can't JOIN across shards
Can't do transactions across shards
Rebalancing is a nightmare
Schema changes hit every shard

Don't shard until you absolutely have to. Try everything else first: bigger servers, read replicas, caching, optimizing queries.

Choosing a Database

The Pragmatic Approach

Start with MySQL. Seriously.

MySQL is the world's most popular open-source database for a reason. It's simple to set up, well-documented, runs everywhere, and handles the needs of 90% of applications. Facebook, Twitter, and Airbnb all started with MySQL.

When you outgrow MySQL or need more advanced features, PostgreSQL is the natural next step. PostgreSQL offers:

Better support for complex queries
JSON documents (JSONB column type)
Full-text search built-in
Geographic data (PostGIS extension)
More advanced data types (arrays, hstore, etc.)

But don't start with PostgreSQL just because it's "more powerful." MySQL's simplicity is a feature, not a bug. You can always migrate later when you actually need those advanced capabilities.

Decision Guide

Need	Choice
Starting a new web app	MySQL
Need advanced features	PostgreSQL
Session/cache	Redis
Write-heavy at scale	Cassandra, ScyllaDB
Full-text search	Elasticsearch
Analytics	ClickHouse, BigQuery
Social graph	Neo4j + MySQL/PostgreSQL

Questions to Ask

Access patterns: How will you query? Complex JOINs or simple lookups?
Data structure: Structured tables or flexible documents?
Scale: Single server sufficient? Write-heavy?
Consistency: Strong required or eventual acceptable?

Performance Tips

Profile first. EXPLAIN ANALYZE before optimizing.

Index strategically. Columns you query. Not everything.

Connection pooling. Opening connections is expensive. Reuse them.

Batch operations:

sql
-- Slow: 1000 inserts
INSERT INTO logs (msg) VALUES ('log 1');
INSERT INTO logs (msg) VALUES ('log 2');

-- Fast: batch
INSERT INTO logs (msg) VALUES ('log 1'), ('log 2'), ...;

Denormalize carefully. Sometimes duplicating data avoids expensive JOINs. Trade-off: data can become stale.

Some Real World Examples

Discord's database migration was painful

Discord stored billions of chat messages in MongoDB. It worked fine initially, but as they scaled, MongoDB couldn't keep up. Reading the last 50 messages in a channel was taking 40-125ms. That's noticeable lag.

They spent months migrating to Cassandra. Why Cassandra? Because chat messages are basically time-series data. You're always appending new messages and reading recent ones. Cassandra is built for exactly this pattern.

After the migration, reads dropped to 5ms. Writes became sub-millisecond. But the migration itself? Months of work, careful data movement, running both databases in parallel.

The lesson here isn't use Cassandra. It's that your access pattern matters more than anything else. MongoDB is fine for lots of things. Just not billions of time-series records.

Source: Discord Engineering Blog

Key Takeaways

Relational is the default. Start with MySQL for simplicity. Graduate to PostgreSQL when you need advanced features.

NoSQL solves specific problems. Use it when relational genuinely can't meet your needs, not because it's trendy.

Indexes make reads fast, writes slow. Index columns you query frequently. Use EXPLAIN to verify.

Scaling is a progression. Vertical → Read replicas → Caching → Sharding. Don't skip steps.

Choose based on access patterns. How you query matters more than how much data you have.

What's Next

You understand the fundamentals of databases. But how do you choose between relational and NoSQL? When does each make sense?

Next up: Data Partitioning & Replication — how to split data across multiple databases and keep copies in sync when one server isn't enough.

There's quite a lot of data to store like Customer names, Order history and the Inventory itself. This data shouldn't just disappear when you restart your server.

Get it right early, and you're fine. Get it wrong, and you're looking at a painful migration 18 months later when you're already at scale. Ask me how I know.

This lesson isn't about SQL syntax. It's about understanding what databases are actually good at, where they fall apart, and how to pick one without overthinking it.

What You Will Learn

Why relational databases (MySQL, PostgreSQL) are still the default choice
When NoSQL databases make sense (and when they don't)
How indexes work and why they matter
ACID transactions and why they're important
Database scaling patterns: replicas, sharding, and when to use each
How to choose the right database for your use case

Relational Databases: The Default Choice

Relational databases are like spreadsheets on steroids. Tables with rows and columns. Everything has a place. Everything follows rules.

MySQL and PostgreSQL have been around forever. They're boring but that's actually a good thing.

sql
CREATE TABLE users (
    id          SERIAL PRIMARY KEY,
    email       VARCHAR(255) UNIQUE NOT NULL,
    name        VARCHAR(100)
);

CREATE TABLE orders (
    id          SERIAL PRIMARY KEY,
    user_id     INTEGER REFERENCES users(id),
    total       DECIMAL(10, 2)
);

Why They Win

Data integrity. The database won't let you insert garbage:

sql
INSERT INTO orders (user_id) VALUES (99999);
-- ERROR: foreign key violation (user doesn't exist)

This saved me from so many bugs when I was starting out.

JOINs just work:

sql
SELECT users.name, orders.total
FROM users
JOIN orders ON orders.user_id = users.id;

Transactions. Either everything happens or nothing happens:

sql
BEGIN;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

If the second UPDATE fails, the first one gets rolled back. Money doesn't vanish.

You can ask questions you didn't plan for. Need a report on users who ordered in the last 30 days? Just write a query. With NoSQL, you better hope you designed for that query upfront.

Where They Struggle

Schema changes are scary. Alter commands take time.

Rigid schemas. If your data structure keeps changing, you're fighting the database.

NoSQL: When Relational Doesn't Fit

NoSQL isn't better or worse than relational. It's different as it trades some guarantees for other benefits.

That's where NoSQL makes sense. Sometimes.

Document Databases (MongoDB, DynamoDB)

Instead of rigid tables, you store JSON documents. Each document can have different fields.

json
{
  "_id": "user_123",
  "email": "alice@example.com",
  "preferences": {
    "theme": "dark",
    "notifications": true
  },
  "orders": [
    {"id": "order_1", "total": 99.99, "items": 3},
    {"id": "order_2", "total": 49.99, "items": 1}
  ]
}

Use it for:

Content management systems where every article has different fields
Product catalogs (a shirt has size/color, a laptop has CPU/RAM/storage)
User profiles that evolve over time

Don't use it for:

Anything that needs JOINs across collections
Financial transactions
Reports that aggregate across multiple document types

MongoDB is popular because it's easy to get started.

Key-Value Stores (Redis, DynamoDB)

It's literally just key-value pairs. Like a hash map that's persistent and distributed.

plaintext
SET user:123:session "abc123xyz"
GET user:123:session → "abc123xyz"

SET user:123:cart '{"items": [1, 2, 3]}'
GET user:123:cart → '{"items": [1, 2, 3]}'

Use it for:

Session storage
Caching (we just covered this in previous lesson)
Leaderboards
Rate limiting

Don't use it for:

Anything where you need to search by value
Relationships

Redis is the most common one. It's really fast with sub-millisecond latency.

Wide-Column (Cassandra, ScyllaDB)

Wide-column stores like Cassandra are built for massive write throughput. We're talking hundreds of thousands of writes per second.

Good for:

Time-series data (metrics, logs, IoT sensor data)
Event logging at massive scale
Write-heavy workloads (100K+ writes/second)

Bad for:

Complex queries and aggregations
JOINs
Transactions across rows

When to use it: Time-series data at scale, Logs, Metrics and IoT sensor data. Anything where you're writing constantly and querying by time ranges.

I worked on a system that ingested around 200K events per second. Cassandra handled it seamlessly.

Graph Databases (Neo4j)

The analogy: A web of sticky notes connected by strings. Great for following connections.

plaintext
(Alice)-[:FRIEND]->(Bob)-[:WORKS_AT]->(Google)
(Alice)-[:FRIEND]->(Carol)-[:WORKS_AT]->(Google)

Query: "Find friends of Alice who work at Google"

Good for:

Social networks (friends of friends)
Recommendation engines
Fraud detection (finding suspicious patterns)

Bad for:

Simple CRUD operations
Large-scale data that doesn't have relationships
Horizontal scaling

When to use it: Social networks. Recommendation engines. Fraud detection.

If you're constantly traversing relationships ("friends of friends who like X"), graph databases are way faster than doing recursive JOINs in SQL.

But honestly? Most apps don't need this.

Quick Comparison

Type	Best For	Don't Use For
Relational (MySQL/Postgres)	Most applications	Massive write throughput
Document (MongoDB)	Flexible schemas	Complex queries across collections
Key-Value (Redis)	Cache, sessions	Searching by value
Wide-Column (Cassandra)	Time-series at scale	Transactions
Graph (Neo4j)	Social networks	Simple CRUD apps

Indexing: The Library Card Catalog

How Database Indexes Work

Without an index, the database checks every row:

plaintext
Query: Find user with email 'alice@example.com'
Check row 1... no
Check row 2... no
Check row 3... no
... (1 million rows later)
Check row 847,293... yes! Found it.

With an index on the email column:

plaintext
Query: Find user with email 'alice@example.com'
Look up 'alice@example.com' in index → Row 847,293
Jump directly to row 847,293 → Found it!

The index is a sorted data structure (usually a B-tree) that maps values to row locations. Lookups go from O(n) to O(log n).

Types of Indexes

Primary key index (automatic): Every table should have a primary key. The database automatically creates an index on it.

Secondary index (you create these):

sql
CREATE INDEX idx_users_email ON users(email);

Composite index (multiple columns):

sql
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- This uses the index:
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';

-- This also uses the index:
SELECT * FROM orders WHERE user_id = 123;

-- This CANNOT use the index:
SELECT * FROM orders WHERE status = 'pending';

Composite indexes work left-to-right. An index on (user_id, status) helps queries on user_id or user_id + status. But NOT queries on just status.

I've debugged this exact issue multiple times. The query is slow, you check, and there's an index on the column but it's the second column in a composite index.

The Index Trade-off

Indexes aren't free:

They take disk space (the index itself is data)
They slow down writes (every INSERT/UPDATE has to update the indexes too)

Index columns you actually query. If you have WHERE email = ... in your queries, index email.

Don't index everything. I've seen tables with 15 indexes. Half of them were never used. Just slowing down writes for no reason. Don't index columns you never query.

Always Use EXPLAIN

Use EXPLAIN to check if your index is actually being used:

sql
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';

If you see "Seq Scan" on a big table, you need an index. If you see "Index Scan," you're good.

Transactions and ACID

ACID is an acronym but honestly I just remember:

Atomicity: All or nothing. No half-done transactions.
Consistency: No invalid states.
Isolation: Transactions don't step on each other.
Durability: Once committed, it's permanent. Even if the server crashes.

Isolation Levels

Level	What You Get	Performance
Read Uncommitted	Can see uncommitted changes	Fastest
Read Committed	Only see committed changes	Good balance
Repeatable Read	Same query = same results	Slower
Serializable	Complete isolation	Slowest

Most databases default to "Read Committed" and that's fine for 99% of use cases.

The only time I've changed isolation levels was for a financial system that needed Serializable. Everything else? Default is fine.

Distributed Transactions

Distributed transactions (across multiple databases) are a pain. Most systems just accept eventual consistency.

2PC (Two-Phase Commit): Coordinator ensures all commit or all abort. Slow, rarely used at scale.

Saga pattern: Local transactions with compensating actions. If step 2 fails, undo step 1.

Eventual consistency: Accept temporary inconsistency, resolve later.

Scaling Patterns

Vertical Scaling (Bigger Server)

Just get a bigger server. More RAM, faster CPU, SSDs.

A good PostgreSQL server can handle 10-15K queries per second. That's enough for most apps. Don't overthink scaling.

Read Replicas

One primary database for writes. Multiple read replicas for reads.

This is the most common scaling pattern. Works great for read-heavy workloads (which is most web apps).

You can handle this by routing reads for recently-written data to the primary. Or just accept it. Most users don't notice a 100ms delay.

Sharding (Last Resort)

Sharding means splitting your data across multiple databases.

Example: Users 1-1M go to Database 1. Users 1M-2M go to Database 2.

Why it sucks:

Can't JOIN across shards
Can't do transactions across shards
Rebalancing is a nightmare
Schema changes hit every shard

Don't shard until you absolutely have to. Try everything else first: bigger servers, read replicas, caching, optimizing queries.

Choosing a Database

The Pragmatic Approach

Start with MySQL. Seriously.

When you outgrow MySQL or need more advanced features, PostgreSQL is the natural next step. PostgreSQL offers:

Better support for complex queries
JSON documents (JSONB column type)
Full-text search built-in
Geographic data (PostGIS extension)
More advanced data types (arrays, hstore, etc.)

But don't start with PostgreSQL just because it's "more powerful." MySQL's simplicity is a feature, not a bug. You can always migrate later when you actually need those advanced capabilities.

Decision Guide

Need	Choice
Starting a new web app	MySQL
Need advanced features	PostgreSQL
Session/cache	Redis
Write-heavy at scale	Cassandra, ScyllaDB
Full-text search	Elasticsearch
Analytics	ClickHouse, BigQuery
Social graph	Neo4j + MySQL/PostgreSQL

Questions to Ask

Access patterns: How will you query? Complex JOINs or simple lookups?
Data structure: Structured tables or flexible documents?
Scale: Single server sufficient? Write-heavy?
Consistency: Strong required or eventual acceptable?

Performance Tips

Profile first. EXPLAIN ANALYZE before optimizing.

Index strategically. Columns you query. Not everything.

Connection pooling. Opening connections is expensive. Reuse them.

Batch operations:

sql
-- Slow: 1000 inserts
INSERT INTO logs (msg) VALUES ('log 1');
INSERT INTO logs (msg) VALUES ('log 2');

-- Fast: batch
INSERT INTO logs (msg) VALUES ('log 1'), ('log 2'), ...;

Denormalize carefully. Sometimes duplicating data avoids expensive JOINs. Trade-off: data can become stale.

Some Real World Examples

Discord's database migration was painful

After the migration, reads dropped to 5ms. Writes became sub-millisecond. But the migration itself? Months of work, careful data movement, running both databases in parallel.

The lesson here isn't use Cassandra. It's that your access pattern matters more than anything else. MongoDB is fine for lots of things. Just not billions of time-series records.

Source: Discord Engineering Blog

Key Takeaways

Relational is the default. Start with MySQL for simplicity. Graduate to PostgreSQL when you need advanced features.

NoSQL solves specific problems. Use it when relational genuinely can't meet your needs, not because it's trendy.

Indexes make reads fast, writes slow. Index columns you query frequently. Use EXPLAIN to verify.

Scaling is a progression. Vertical → Read replicas → Caching → Sharding. Don't skip steps.

Choose based on access patterns. How you query matters more than how much data you have.

What's Next

You understand the fundamentals of databases. But how do you choose between relational and NoSQL? When does each make sense?

Next up: Data Partitioning & Replication — how to split data across multiple databases and keep copies in sync when one server isn't enough.