Scalable Datastores
Rick Cattell

(page last updated August 2010)


In recent years, a number of data storage systems have been developed with excellent horizontal scaling properties.  They are often called "NoSQL" systems.  Horizontal scaling allows dozens or hundreds of machines to operate as a single database system, performance improving approximately linearly with the number of machines. This is interesting because traditional relational database systems have failed to scale well when their data is distributed over multiple nodes, with the exception of read-mostly data warehousing.  I have been studying scalable datastores, and I'm working on a paper comparing them.  Here is the current version of the paper:

    Datastore Comparison

In the paper, I discuss scalable data stores and categorize them into four groups:

Any suggestions, input, or corrections on my paper would be greatly appreciated, particularly from experts on specific systems.  You can contact me at rick(at)cattell.net.  I will continue to revise and post the paper on this site as I get input. I also have a related paper that has been accepted for publication in Communications of the ACM.  You can see a copy here:

    CACM Paper

For further reading, you can click on the links above to learn more about specific systems.  Also, here are some other general references I like: