Replication

Replication means keeping a copy of same data on multiple machines that are connected via a network.

Replication Helps:

to keep data Geographically close to the user
to allow the system to continue working even if some of its parts have failed (HA)
to scaleout the number of machines that serve read queries (increase read throughput)

If the data that you are replicating does not changes over time, then replication is easy. You just need to copy the data to every node once and you are done.

Complex to handle changes to replicated data.

Three popular Algorithms:

Single leader replication
Multi-leader replication
Leaderless replication

There are various tradeoffs to consider with replication:

Synchronous or asynchronous replication
handle failed replicas
leader/leaders/leaderless replicas

Leader based replication

One of the replicas is designated the leader (Master). When client wants to write to the database, they must send their requests to the leader, which first writes the new data to its local storage.
The other replicas are known as followers (read replicas, slaves, secondaries or hot standby). Whenever the leader writes new data to its local storage, it also sends the data changes to all of its followers as part of a replication log or changes stream (Explained later)
Each followers update its local copy based on replication log in the same order, as they were processed on the leader.
For read, client can query either the leader or any of the followers.
Leader based replication is not only restricted to only databases; distributed messages brokers e.ge Kafka, RabitMQ.

Setting up new Follower:

Setting up follower is usually expected without any down time.

Conceptually the process looks like this:

Take a consistent snapshot of the leader's database at some point in time. Most of the databas have this feature.
Copy the snapshot to the new follower node.
The follower connects to the leader and request all the data changes thta have happened since the snapshot was taken. (In PostgreSQL it is called log sequence number, in MySql it is called binlog coordinates)
When the follower has processed the backlog of data changes since the snapshot, we say it has caught-up. It can now continue to process data changes from the leader as they happen.

Synchronous vs Asynchronous replication :

In most of the relational database, this is often a configurable option.
In Synchronous: The leader waits until follower has confirmed that it received the write before reporting success to the user.
In Asynchronous: The leader does not wait for a response from the followers, leader reports success to the user after updating database locally.
Advantage of Synchronous Replication:

The follower is guaranteed to have an up to date copy of the data that is consistent with the leader.
If the leader suddenly fails, we can be sure that the data is still available on follower.

Disadvantage of Synchronous Replication:

If follower does not respond (because it has crashed or there is network fault, or any other reason) the write can not be proceed.
The leader must block all writes and wait until the synchronous replica respond back.

Asynchronous replication gives weak durability. Data can be lost if leader fails.

Semi Synchronous:

In synchronous replica, any one node outage would cause the whole system to grind to a halt.
In general, if you enable synchronous replication on a database, it usually means that one of the followers is synchronous and others are asynchronous.
If the synchronous follower become unavailable or slow, one of the asynchronous follower is made synchronous.
This guarantees that you have an up to date copy of the data on at least two nodes; the leader and one synchronous node.

Because asynchronous configuration has the advantage that leader can continue processing writes, even if all of its followers have fallen behind, so often leader based replication is configured to be completely asynchronous.

Tradeoff : Weaker durability.

If leader fails and is not recoverable, then any writes that have not been replicated yet are lost.

That means write is not guranteed to be available.

Search This Blog

System Design

Replication

Replication

Three popular Algorithms:

Leader based replication

Setting up new Follower:

Synchronous vs Asynchronous replication :

Tradeoff : Weaker durability.

Next : Single Leader replication. Most of the concepts for single leader replication are applicable for Multi-leader replication as well.

Comments

Post a Comment

Popular posts from this blog

Distributed Lock with Redlock

Distributed Transaction

Storage Engine