B-Tree
B-Tree
- The Most widely used indexing structure is B-Tree - Mostly in relational databases.
- B-Tree keeps key-value pairs sorted by key.
- B-Tree breaks the database down into fixed size block or pages; 4 KB in size and read and write one page at a time.
- Each page can be identified using an address or location, which allows one page to refer to another - similar to pointer but on disk.
- We can use these page reference to construct a tree pf pages.
Example:
One page is designated as the root of the B-Tree.
The page contains several keys and references to child pages. Each child is responsible for a continuous range of keys and the keys between the references indicate where the boundaries between those ranges lies.
Eventually we get down to a page containing keys (a leaf page), which either contains the value for each key inline or contains references to the pages where the values can be found.
Branching Factor :
The number of references to the child pages in one page of the B-tree is called the branching factor.
Branching factor depends on the amount of space required to store the page references and the range boundaries.
Typically it is several hundreds.
B-Tree Algorithm :
- The algorithm of B-Tree ensures that the tree remains balanced.
- A B-Tree with N keys always has a depth of O(logN). Most databases can fit into B-Tree that is three or four levels deep, so you do not need to follow many pages to find the page you are looking for.
- A four level tree of 4 KB page with branching factor of 500 can store up to 256 TB data.
- Algorithm follows this:
- If you want to add a new key, you need to find the page whose range encompasses the new key and add it to that page.
- If there is not enough space in the page to accomodate the new key, it is split into two half-full pages, and the parent page is updated to account for the new subdivision of key ranges.
The basic underlying write operation of a B-Tree is to overwrite a page on disk with new data.
while in LSM-Tree, write operation only append to log file.
Also, sometime, you split page and update references in parent page. All this happens on disk.
To make database resilient, it includes an additional concept: a write ahead log.
Write Ahead Log (WAL) :
This is an append only file, to which every B-Tree modification must be written before it can be applied to the pages of tree itself.
If the database crash and comes back, this log is used to restore the B-Tree back to a consistent state.
Concurrency :
Concurrency is done by protecting the tree's data structure with latches (light weight locks).
Log structured approaches are simpler in this regards, because they do all the merging in the background without interfering with incoming queries.
What is next: Comparing B-Trees and LSM Trees

Comments
Post a Comment