In a regular Internet service, the read:write ratio is about 100:1 to 1000:1. However, when reading from a hard disk, a database join operation is time-consuming, and 99% of the time is spent on disk seek. Not to mention a distributed join operation across networks.
To optimize the read performance, denormalization is introduced by adding redundant data or by grouping data. These four categories of NoSQL are here to help.
Key-value Store
The abstraction of a KV store is a giant hashtable/hashmap/dictionary.
The main reason we want to use a key-value cache is to reduce latency for accessing active data. Achieve an O(1) read/write performance on a fast and expensive media (like memory or SSD), instead of a traditional O(logn) read/write on a slow and cheap media (typically hard drive).
There are three major factors to consider when we design the cache.
- Pattern: How to cache? is it read-through/write-through/write-around/write-back/cache-aside?
- Placement: Where to place the cache? client-side/distinct layer/server side?
- Replacement: When to expire/replace the data? LRU/LFU/ARC?
Out-of-box choices: Redis/Memcache? Redis supports data persistence while Memcache does not. Riak, Berkeley DB, HamsterDB, Amazon Dynamo, Project Voldemort, etc.
Document Store
The abstraction of a document store is like a KV store, but documents, like XML, JSON, BSON, and so on, are stored in the value part of the pair.
The main reason we want to use a document store is for flexibility and performance. Flexibility is achieved by the schemaless document, and performance is improved by breaking 3NF. Startup's business requirements are changing from time to time. Flexible schema empowers them to move fast.
Out-of-box choices: MongoDB, CouchDB, Terrastore, OrientDB, RavenDB, etc.
Column-oriented Store
The abstraction of a column-oriented store is like a giant nested map: ColumnFamily<RowKey, Columns<Name, Value, Timestamp>>
.
The main reason we want to use a column-oriented store is that it is distributed, highly-available, and optimized for write.
Out-of-box choices: Cassandra, HBase, Hypertable, Amazon SimpleDB, etc.
Graph Database
As the name indicates, this database's abstraction is a graph. It allows us to store entities and the relationships between them.
If we use a relational database to store the graph, adding/removing relationships may involve schema changes and data movement, which is not the case when using a graph database. On the other hand, when we create tables in a relational database for the graph, we model based on the traversal we want; if the traversal changes, the data will have to change.
Out-of-box choices: Neo4J, Infinitegraph, OrientDB, FlockDB, etc.