Comparison of B-Tree and LSM Tree
- LSM Trees are typically faster for writes, whereas B-Trees are thought to be faster for read.
- Reads are typically slower in LSM Tree because they have to check several different data structures and SSTables at different stages of compaction.
Advantage of LSM Tree :
- A B-Tree index must write every piece of data at least twice: One to "write ahead log", and once to the tree page (perhaps again on page split).
- There is also overhead from having to write an entire page at a time, even if only few bytes in that page changed.
- Log structured indexes also rewrite data multiple times due to repeated compaction and merging of SSTables, but that mostly carries by background threads.
Write Amplification and Disk performance :
- One write to the database resulting in multiple writes to the disk over the course of database lifetime, is known as write amplification.
- It is a particular concern on SSDs, which can only overwrite blocks a limited number of times before wearing out.
- In write heavy application, write amplification has a direct performance cost.
- Moreover, LSM-Trees are typically able to sustain higher write throughput than B-Tree.
- LSM tree can be compressed better, and thus often produce smaller files on disk than B-Tree.
- B-Tree storage engine leave disk space unused due to fragmentation.
Downside of LSM Tree :
- The compaction process sometimes interfere with the performance of ongoing reads and write.
- The response time of queries to log structured storage engine can sometimes be quite high, B-Tree can be more predictable.
- Another issue with compaction arises at high write throughput: Disk's write bandwidth needs to be shared between write (logging and flushing memtable to disk) and the compaction threads running in the background.
- If compaction is not configured carefully, it can happen that compaction can not keep up with the rate of incoming writes. Hence,
- number of unmanaged segments on disk keep growing until you run out of disk space.
- read will also slow down because they need to check more segment files.
- So, you need explicit monitoring to detect this situation.
Comments
Post a Comment