Database Replication

A single database server is a single point of failure. If it goes down, your application goes down. Database replication solves this by copying data across multiple servers, giving you redundancy, better read performance, and the ability to serve users closer to where they are.

Why Replicate?

Replication serves several purposes, and most production systems use it for at least one:

High availability means if your primary database dies, a replica can take over. Your application keeps running while you fix the problem.

Read scaling distributes read queries across multiple servers. If your application reads far more than it writes (most do), replicas handle that load.

Geographic distribution puts data closer to users. A replica in Europe serves European users faster than a primary in the US.

Backups become easier when you can take snapshots from a replica without impacting your primary database's performance.

Replication Types

The fundamental question is: when does a write count as "done"?

Synchronous replication waits for at least one replica to confirm it received the data before acknowledging the write. This guarantees no data loss if the primary fails, but writes are slower.

Asynchronous replication acknowledges writes immediately, then sends data to replicas in the background. Faster writes, but if the primary fails before replication completes, you might lose recent data.

Semi-synchronous is the middle ground — wait for at least one replica, but not all of them.

Common Patterns

The most common setup is primary-replica (sometimes called master-slave):

Primary-Replica Pattern:
  Primary: handles all writes
  Replicas: handle reads, provide failover
  
  Application → writes → Primary
  Application → reads  → Replicas

Multi-primary setups allow writes to multiple nodes. This is more complex because you need conflict resolution when two nodes modify the same data simultaneously. It's useful for geographic distribution but adds significant complexity.

Replication Lag

With asynchronous replication, replicas are always slightly behind the primary. This replication lag can cause issues — a user writes data, then immediately reads from a replica that hasn't received the write yet.

Solutions include reading from the primary after writes, or using synchronous replication for critical data.

See More

Further Reading

Last updated December 26, 2025

You need to be signed in to leave a comment and join the discussion