Implementation of Replication Logs

Several different replication methods are used in practice.

Statement based replication:

Simplest one, the leader logs every write request (statement) that it executes and sends that statement log to its followers. Each follower parses and execute the SQL statement as if it has been received from a client.

Issue

Any statement that calls a non-deterministic function such as NOW() to get current date and time or RAND() to get a random numbers, is likely to generate a different value on each replica.
If statement use an auto incrementing column, or if they depend on the existing data in the database, they must be executed in exactly the same order on each replica or they may have different effect. This can be limiting when there are multiple concurrently executing transaction.

It is possible to work around those issues for example the leader can replace any nondeterministic function call with a fixed return value when the statement is logged so that the followers all get the same value. There are so many different use cases, where this method can cause problem is followers.

MSSQL moved out from statement based to row based replication from version 5.1 onwards.

Write Ahead Log (WAL) shipping replication:

In case of log structured storage engine (SSTable and LSM Tree), write ahead log is the main place of storage. Log segments are compacted and garbage-collected in the background.
In case of B-Tree, which overwrite individual disk blocks, every modification is first written to write ahead log so that the index can be restored to a consistent state after a crash.

In either case, log is append-only sequence of bytes containing all writes to the database. We can use the exact same log to build a replica on another node: Leader will write the log to disk also sends it across the network to its followers. Ex: Postgresql, OracleDB

Issue:

WAL contains details of which bytes were changed in which disk blocks. If the database changes its storage format from one version to another, it is typically not possible to run different versions of the database software on the leader and followers.

Logical (row based) replication:

Replication log should be de-coupled from the storage engine. This kind of replication log is called a logical log.

Logical log replication keeps granularity of update or new row.
For an inserted row, the log contains the new values of all columns.
For a deleted row, the log contains enough information of uniquely identify the row that was deleted. Typically this would be the primary key, but if there is no primary key in the table, the old values of all columns need to be logged.
For updated row, the log contains enough information to uniquely identify the updated row, and the new values of all columns.
A transaction that modifies several rows generates several such log records, followed by a record indicating that the transaction was committed.
A logical log format is also easier for external applications to parse. This aspect is useful if you want to send the contents of a database to an external system, such as data warehouse for offline analysis. This technique is called "Change data capture".

Trigger:

A trigger lets you register custom application code that is automatically executed when a data change occurs in a database system.
The trigger has the opportunity to log this change into a separate table, from which it can be read by an external process.
That external process can then apply necessary application logic and replicate the data changes to another system.

Next, we will discuss about Consistency problem and possible solutions.

Search This Blog

System Design