Scalable SQL and NoSQL Data Stores

Motivated by Web 2.0, the application was designed to support thousands of millions of users write and read concurrently. In contrast to traditional DBMS, NoSQL accepts BASE when designing, which means that they sacrifice some of the dimensions like transaction or strong consistency in order to maintain high availability or scalability.

Generally, there are several key features of NoSQL:

  • Horizontally Scale.

  • Fault Tolerant and increasing throughput by partition and replication.

  • A weaker consistent model compared to ACID.

  • Efficient use of RAM and distributed index.

Among the various NoSQL products, Google Bigtable, Amazon Dynamo and Memcached provide a “proof of concept” that inspired many deriving data stores.

  • BigTable pioneered the idea of partitioning persistent data into thousands of nodes.

  • Memcached demonstrated that in-memory indexes can be highly scalable, distributing and replicating objects over multiple nodes.

  • Dynamo is the first one to design a DB with eventually consistency and increase the service scalability and availability.

NoSQL has a key feature called shared-nothing horizontal scaling by replicating data over many servers. But now NewSQL (e.g Amazon Aurora Amazon Aurora: Design for HighThroughput Cloud-Native Relational Databases.) has started the trend of shared-storage architecture where server nodes don’t store data locally but in Distributed File System.

Category

Key-Value Store: These systems store values and an index to find them, based on a programmer defined Key. (Redis, Memcached, Voldemort.)

Document Store: The documents are indexed and a simple query mechanism is provided. (MongoDB, CouchDB)

Extensible Record Stores: The systems store extensible records that can be partitioned vertically and horizontally across nodes (Wide-Column-Stores). (BigTable, HBase, Cassandra)

Relational Database: (MySQL, PostgreSQL)

Key Value Stores

Suitable for Query only by the primary key.

Data Model Concurrence Control Consistency Partition Sync Partition Logic Clock
Voldemort KV Pairs MVCC Eventually Consistency Cons Hashing Async Dynamo Vector Clock
Riak Json MVCC Quorum (R, W, N) Cons Hashing Async Dynamo Vector Clock
Redis Various Data Structure Lock Eventually Consistency Cons Hashing Async No need (Leader Based)
Scalaris Various Data Structure ACID Strong Consistency Range Sync

Document Stores

Unlike KV Store, the document store provides the ability to query by multiple columns.

Concurrence Control Consistency Partition Sync Partition Feature
SimpleDB N.A Eventually Consistency Manually Async Does not allowed nested docs
CouchDB MVCC on single Doc ACID on one document Manually Async
MongoDB Atomic Writes on field Automatic sharding key defined by user Master-Slave Async

Extensible Records Stores

The extensible record stores have been motivated by Google Bigtable. Their basic data model is rows and columns and their basic scalability model is splitting both rows and columns over multiple nodes.

  • Rows are split across nodes through sharding on the primary key. They typically split by range rather than hash so that the range query can be partition pruned.

  • Columns of the table are distributed over multiple nodes by using “Column Groups”. These are for the relevant columns to be grouped together. The column groups don’t have to be stored on the same node, that is kind of vertical partitioning.

HBase

HBase is a distributed table store system patterned directly after Bigtable.

  • HBase has a Log Structured merge file index allowing fast range queries and sorting

  • HBase persistent storage uses HDFS instead of GFS, it writes to the RAM immediately and periodically flushes the data to the Storage

  • Row operations are atomic with row level locks and there’s optional support for transaction within a wide range based on MVCC. If multiple versions have conflict, the following ones will be aborted.

Cassandra

  • Cassandra is similar to HBase in the storage process that immediately writes to RAM and periodically flushes to disk.

  • Cassandra has a weaker consistency model where there’s no locking mechanism and replicas are updated asynchronously.

  • Cassandra uses an ordered hash index which should give most of the benefit of both hash and BTree but the sorting will be slower than BTree.

  • Cassandra uses quorum read to provide a way to get the latest data.

Scalable Relational Database

From MySQL Cluster, several new products have come out, e.g VoltDB and Clustrix. It appears that the RDMS are restricted by two provisions:

  • Small Scope Operation: Join over many tables will not scale well with sharding.

  • Small transactions: Transactions that span many nodes are going to be very insufficient with the communication of 2PC overhead.

MySQL Cluster

MySQL Cluster works by replacing the InnoDB layer with NDB layer and the data is sharded over multiple database servers (shared nothing architechture). Every shard s replicated to support recovery.

MySQL Cluster support in-memory storage as well as disk storage.

VoltDB

VolteDB’s scalability and availability features are competitive with MySQL Cluster and NoSQL systems

  • Tables are partitioned over multiple servers and clients can call any servers.

  • Selected tables can be replicated over servers, e.g for fast access to read-mostly data.

  • Fault Tolerance Data can be recovered from peer nodes in the event of one node crashs.

VoltDB eliminates nearly all “waits” in SQL execution, allowing a very efficient implementation:

  • The system is designed for a database that fits in the RAM on servers so that the system need never wait for the disk. Indexes and record structures are designed for RAM rather than disk and the overhead of Disk cache/buffer is eliminated as well.

Scalable SQL and NoSQL Data Stores
https://garygky.github.io/2022/10/04/nosql/
Author
Kaiyuan Gan
Posted on
October 4, 2022
Licensed under