Distributed Transaction

Distributed Transaction

A distributed transaction across multiple database services involves multiple databases that are geographically distributed across different locations. 

  

Here are some important steps to consider when designing a system for distributed transactions across multiple database services:  

  • Transaction Protocol   
  • Transactional consistency   
  • Concurrency control   
  • Reliability and Fault-tolerance   
  • Performance 

Transactional Protocol

  • A distributed transaction protocol is a set of rules and procedures that define how a distributed transaction should be executed across multiple databases. 
  • It is used for Atomic commit across databases  
  • There are several distributed transaction protocols to choose from, each with its own advantages and disadvantages.  
    • Two-Phase Commit (2PC) : 
      • It is a widely-used protocol that ensures atomicity and durability. 
      • However, it is slow and complex. 
      • More details can be found here.  
    • Three-Phase Commit(3 PC): 
      • It is a more efficient protocol that can handle failures more gracefully, with bounded delay.  
      • However, can be more complex to implement. 
      • More details can be found here.  
    • Paxos: 
      • It is a consensus algorithm that is used to ensure agreement among a group of distributed nodes on a single value or decision. 
      • More details can be found here

Transactional Consistency 


Transactional consistency refers to the degree to which a transaction appears to have been executed atomically, consistently, and in isolation from other transactions. 

Atomic Commit:  

Atomic commit is very important aspects while committing the transaction. 

  • On Single Node: 
    • When the client asks the database node to commit the transaction, the database makes the transaction’s write durable, by writing write-ahead log, and then append a commit record to the log on disk. 
    • If the database crashes in the middle of this process, the transaction is recovered from the log when the node restarts. 
    • If the commit record was written successfully to disk before the crash, the transaction is considered committed. 
    • If not, any record writes from that transaction are rolled back. 
  • Thus on a single node, transaction commitment crucially depends on the order in which data is durably written to disk: First the data and the commit record. 
  • Multi-Object transactions On Partitioned Database: 
    • You have a multi-object transaction in a partitioned database. It is not sufficient to simply send a commit request to all the nodes and independently commit the transaction on each one. 

    • If some node commit the transaction and others abort it, the node becomes inconsistent with each other. 
    • For this reason, a node must only commit once it is certain that all other nodes in the transaction are also going to commit. 
    • A transaction commit must be irrevocable; the reason for this rule is that once data has been committed, it becomes visible to other transaction, and thus other client may start relying on that data.
  • If we want to maintain transaction atomicity, we have to get all nodes to agree on the outcome of the transaction. Either they all abort/rollback (If anything goes wrong) or they all commit.  
  • Use the right transaction protocol to achieve atomic commit. 

Consistency:  

  • If you look at two database nodes at the same moment of time, you are likely to see different data at two nodes, because write requests arrive on different nodes at different time. These inconsistency occurs no matter what replication method the database use (Single leader, multi-leader, leaderless replication)  
  • Eventual Consistency:  
    • Eventual consistency means, all replicated nodes will have same data eventually.  
    • So, inconsistency is temporary and eventually it resolves itself  
  • Linearizability (Strong Consistency):  
    • In eventually consistent database, if you ask two different replicas the same question at the same time, you may get two different answers.  
    • Linearizability idea is to make a system appears if there were only one copy of the data, and all operations on it are atomic.  
    • Making a system linearizable can harm its performance and availability, especially if the system has significant network delays (geographically distributed)  
    • For more details, visit here. 
  • Causality and ordering:  
    • Causality represents “happened before” relationship. For example: A row must be created before can be updated. The question comes before answer.  
    • If event A happened before event B, that means B might have known about A or built upon A or dependent upon A. However, if A and B are concurrent, there is no causal link between them.  
    • Causality imposes an ordering on events. If a system obeys the ordering imposed by causality, we say that it is causality consistent.  


Concurrency Control:  

  • Concurrency control is the process of ensuring that multiple users can access and update the same data concurrently, without leading to conflicts and inconsistencies.  
  • There are several concurrency control mechanisms to choose from, including locking and optimistic concurrency control.   
  • Locking involves acquiring locks on data objects to prevent other users from accessing them. Details
  • Optimistic concurrency control involves allowing multiple users to access and update data concurrently, but detecting and resolving conflicts when they occur.  

 Reliability and Fault-tolerance 

  • Reliability and fault-tolerance are critical aspects of any distributed system  
  • This involves designing the system to handle failures, such as network failures, database failures, and node failures  
  • There are several techniques that can be used to ensure reliability and fault-tolerance, including   
    • Replication: It involves maintaining multiple copies of the same data across different nodes. Details
    • Sharding: It involves partitioning data across different nodes  
    • Load balancing: It involves distributing incoming requests across multiple nodes to prevent any one node from becoming overloaded.  

Performance Optimization 

  • It is important to optimize the performance of the system to ensure that transactions can be processed quickly and efficiently  
  • This may involve techniques such as   
    • caching, which involves storing frequently accessed data in memory to reduce database reads.  
    • query optimization, which involves optimizing database queries to reduce their execution time  
    • Load balancing and sharding can also be used to improve performance by distributing the workload across multiple nodes  
  • In addition, it's important to monitor the performance of the system and make adjustments as needed to ensure that it continues to meet the needs of the application. 

Comments

Popular posts from this blog

Distributed Lock with Redlock

Storage Engine