Scalable Datastores
Rick
Cattell
(page last
updated August 2010)
In
recent
years,
a
number
of
data
storage
systems
have been developed with excellent
horizontal scaling properties. They are often called "NoSQL"
systems. Horizontal scaling allows dozens or hundreds of
machines
to operate as a single database system, performance improving
approximately
linearly with the number of machines. This is interesting because
traditional relational database systems have failed to scale well when
their data is distributed over multiple nodes, with the exception of
read-mostly data warehousing. I have been
studying scalable datastores, and I'm working on a paper comparing
them. Here is the current version of the paper:
Datastore Comparison
In the paper, I discuss scalable data stores and categorize them
into four
groups:
- Key-value stores, including Voldemort, Riak,
and Dynamo.
- Document stores, including CouchDB, MongoDB, and SimpleDB.
- Extensible record stores, including BigTable, HBase, HyperTable,
and Cassandra.
- Scalable RDBMSs, including MySQL Cluster,
ScaleDB, and VoltDB.
Any suggestions, input, or corrections on my paper would be
greatly
appreciated, particularly from experts on specific systems. You
can contact me at rick(at)cattell.net. I
will continue to revise and post the paper on this site
as I get input. I also have a related paper that has been accepted for
publication in Communications of the ACM. You can see a copy here:
CACM Paper
For further reading, you can click on the links above to learn more
about specific systems. Also, here are some other general
references I like:
- NoSQL-Database.org has
lots of good articles, upcoming events, and very complete list of all
the systems.
- Krishna
Sankar's
blog had lots of good NoSQL and cloud computing posts,
along with web references.
- Jonathan
Ellis has broad knowledge in NoSQL systems and has some good posts
on his blog. There are some interesting discussions in the
responses as well.
- Schooner has a
good blog on an important trend I don't cover in my paper:
effectively using solid state disks as a third level in the RAM/disk
hierarchy.