Scalable SQL and NoSQL Data Stores

Motivated by Web 2.0, the application was designed to support thousands of millions of users write and read concurrently. In contrast to traditional DBMS, NoSQL accepts BASE when designing, which means that they sacrifice some of the dimensions like transaction or strong consistency in order to maintain high availability or scalability.

Generally, there are several key features of NoSQL:

Horizontally Scale.
Fault Tolerant and increasing throughput by partition and replication.
A weaker consistent model compared to ACID.
Efficient use of RAM and distributed index.

Among the various NoSQL products, Google Bigtable, Amazon Dynamo and Memcached provide a “proof of concept” that inspired many deriving data stores.

BigTable pioneered the idea of partitioning persistent data into thousands of nodes.
Memcached demonstrated that in-memory indexes can be highly scalable, distributing and replicating objects over multiple nodes.
Dynamo is the first one to design a DB with eventually consistency and increase the service scalability and availability.

NoSQL has a key feature called shared-nothing horizontal scaling by replicating data over many servers. But now NewSQL (e.g Amazon Aurora Amazon Aurora: Design for HighThroughput Cloud-Native Relational Databases.) has started the trend of shared-storage architecture where server nodes don’t store data locally but in Distributed File System.

Key Value Stores

Suitable for Query only by the primary key.

	Data Model	Concurrence Control	Consistency	Partition	Sync Partition	Logic Clock
Voldemort	KV Pairs	MVCC	Eventually Consistency	Cons Hashing	Async	Dynamo Vector Clock
Riak	Json	MVCC	Quorum (R, W, N)	Cons Hashing	Async	Dynamo Vector Clock
Redis	Various Data Structure	Lock	Eventually Consistency	Cons Hashing	Async	No need (Leader Based)
Scalaris	Various Data Structure	ACID	Strong Consistency	Range	Sync

Document Stores

Unlike KV Store, the document store provides the ability to query by multiple columns.

	Concurrence Control	Consistency	Partition	Sync Partition	Feature
SimpleDB	N.A	Eventually Consistency	Manually	Async	Does not allowed nested docs
CouchDB	MVCC on single Doc	ACID on one document	Manually	Async
MongoDB		Atomic Writes on field	Automatic sharding key defined by user	Master-Slave Async

Extensible Records Stores

The extensible record stores have been motivated by Google Bigtable. Their basic data model is rows and columns and their basic scalability model is splitting both rows and columns over multiple nodes.

Rows are split across nodes through sharding on the primary key. They typically split by range rather than hash so that the range query can be partition pruned.
Columns of the table are distributed over multiple nodes by using “Column Groups”. These are for the relevant columns to be grouped together. The column groups don’t have to be stored on the same node, that is kind of vertical partitioning.

HBase

HBase is a distributed table store system patterned directly after Bigtable.

HBase has a Log Structured merge file index allowing fast range queries and sorting
HBase persistent storage uses HDFS instead of GFS, it writes to the RAM immediately and periodically flushes the data to the Storage
Row operations are atomic with row level locks and there’s optional support for transaction within a wide range based on MVCC. If multiple versions have conflict, the following ones will be aborted.

Cassandra

Cassandra is similar to HBase in the storage process that immediately writes to RAM and periodically flushes to disk.
Cassandra has a weaker consistency model where there’s no locking mechanism and replicas are updated asynchronously.
Cassandra uses an ordered hash index which should give most of the benefit of both hash and BTree but the sorting will be slower than BTree.
Cassandra uses quorum read to provide a way to get the latest data.

Scalable Relational Database

From MySQL Cluster, several new products have come out, e.g VoltDB and Clustrix. It appears that the RDMS are restricted by two provisions:

Small Scope Operation: Join over many tables will not scale well with sharding.
Small transactions: Transactions that span many nodes are going to be very insufficient with the communication of 2PC overhead.

MySQL Cluster

MySQL Cluster works by replacing the InnoDB layer with NDB layer and the data is sharded over multiple database servers (shared nothing architechture). Every shard s replicated to support recovery.

MySQL Cluster support in-memory storage as well as disk storage.

VoltDB

VolteDB’s scalability and availability features are competitive with MySQL Cluster and NoSQL systems

Tables are partitioned over multiple servers and clients can call any servers.
Selected tables can be replicated over servers, e.g for fast access to read-mostly data.
Fault Tolerance Data can be recovered from peer nodes in the event of one node crashs.

VoltDB eliminates nearly all “waits” in SQL execution, allowing a very efficient implementation:

The system is designed for a database that fits in the RAM on servers so that the system need never wait for the disk. Indexes and record structures are designed for RAM rather than disk and the overhead of Disk cache/buffer is eliminated as well.

#Distributed Storage #NoSQL #RDMS

Scalable SQL and NoSQL Data Stores

https://garygky.github.io/2022/10/04/nosql/

Author

Kaiyuan Gan

Posted on

October 4, 2022

Licensed under

stream-processing Previous

Distributed Lock Next

Scalable SQL and NoSQL Data Stores

Category

Key Value Stores

Document Stores

Extensible Records Stores

HBase

Cassandra

Scalable Relational Database

MySQL Cluster

VoltDB