Storage Engine

Database Storage Engine

Two kinds of database storage engine are most popular:
  1. Log Structured Storage Engine (SSTable, LSMTree)
  2. Page Oriented Storage Engine (B-Tree)

Log Structured Storage Engine 

Most of the databases internally use a log, which is an append only data file.
  • Log file is an append only sequence of records
  • It is not human readable, binary file

Issues

Since log file is append only file, so it keeps growing, hence resulting performance issue.
  • While reading data, every time it looks from beginning
  • Performance degrades if number of records are more in the database
To mitigate these issues, a new concept came out called Indexing.

Index

  • In order to efficiently find the value for a particular key in the database, we need a different data structure, this is called an Index.
  • In General the idea behind them is to keep some additional metadata on the side, which acts as a signpost and helps you to locate the data you want.
  • If you want to search some data in a several different ways, you may need several different indexes on different parts of the data.
  • An index is an additional structure that is derived from the primary data. Many databases allow you to add and remove indexes, and this does not affect the content of the database. It only affects the performance of the queries.

Index Trade-Off

    • Any kind of index usually slow down writes, because the index also needs to be updated every time data is written.
    • This is an important trade-off in storage systems. Well chosen indexes speedup read queries but every index slow down writes.

Hash Index

  •     Index for Key-value data.
  •     The simplest possible indexing strategy:
    • Keep an in-memory hash map, where every key is mapped to a byte offset in the data file, the location at which the value can be found.
    • Whenever, you append a new key value pair to the data file, you also update the hash map to reflect the offset of the data you just wrote.
    • When you want to look up a value, use the hashmap to find the offset in the datafile, seek to the location and read the value.
  • This index is suitable, when number of keys are lesser. It is feasible to keep all keys in the RAM.

Data File:

While writing data in the datafile, it only appends the data file, so need to avoid running out of disk space.
  • Break the data file into small file segments. Once data file reaches to certain size, close this file and write data into new file.
  • Then we can perform compaction on the segments.
    • Compaction means throwing away duplicate keys in the log, and keeping only the most recent update for each key.

  • Data file segment:


Compacted Segment.


  • Moreover, since compaction often makes segments much smaller, we can also merge several segments together at the same time as performing the compaction.
  • Segments (Data files) are never modified after they have written, so the merge segments is written to a new file.
  • The merging and compaction of frozen segments can be done in a background thread and while it is going on, we can still continue to serve read and write requests as normal, using old segment files.
  •  After the merging process is complete, we switch reqd request to using the new merged segment. Old files can be deleted.
  • Each segment now has its own in-memory hash-table, mapping keys to file offsets.
  • In order to find values for a key, we first check the most recent segment's hash map, if the key is not present we check the second most recent segment and so on.
  • The merging process keeps the number of segments small, so lookup do not need to check many hash map.

  • Deleting a record
    • If you want to delete a key and its associated value, you have to append a special deletion  record to the data file, called tombstone.
    • When log segments are merged, the tombstone tells the merging process to discard any previous values for the deleted key.
  • Crash Recovery
    • If the database is restored from crash, the in-memory hashmaps are lost.
    • one solution is that, store a snapshot of each segment's hashmap on the disk which can be loaded into memory more quickly.
  • Partially written records
    • The database may crash at any time, including halfway through appending a record to the log.
    • Some database engine include checksums, allowing such corrupt parts of the log to be detected and ignored.

Limitation of Hash table index

  • The hash table must fit in memory
  • If you have very large number of keys, you are out of luck
  • Range queries are not efficient.
    • Ex: you can not scan overall keys between mter0000 to mter9999 - You would have to lok up each key individually in the hash map.
What is next: SSTables and LSM Tree - Handles limitations of hash index.

Comments

Popular posts from this blog

Distributed Lock with Redlock

Distributed Transaction