Friday 23 January 2015

Read and Write Mechanism in Cassandra

1. Write


When a write occurs, the data will be immediately appended to the commitlog on the disk to ensure write durability. Then Cassandra stores the data in memtable, an in-memory store of hot and fresh data. When memtable is full, the memtable data will be flushed to a disk file, called SSTable, using sequential I/O and so random I/O is avoided. This is the reason why the write performance is so high. The commitlog is purged after the flush.
Due to the intentional adoption of sequential I/O, a row is typically stored across many SSTable files. Apart from its data, SSTable also has a primary index and a bloom filter. A primary index is a list of row keys and the start position of rows in the data file.
Bloom filter is a sample subset of the primary index with very fast nondeterministic algorithms to check whether an element is a member of a set. It is used to boost the performance.

2. Read

When a read request comes in to a node, the data to be returned is merged from all the related SSTables and any unflushed memtables. Timestamps are used to determine which one is up-to-date. The merged value is also stored in a write-through row cache to improve the future read performance.

No comments:

Post a Comment