Working Around the CAP Theorem with rumours-db
If you have been following databases at all recently, you are sure to have heard of the CAP theorem. If not it’s explained very simply here: A plain english introduction to the CAP Theorem.
The CAP theorem is about a database design problem when a database is scaled out to multiple nodes. CAP stands for Consistency, Availability, and Partition Tolerance. Like many classic engineering problems, you can pick any two out of the three options. Except, it doesn’t make sense not to pick Partition Tolerance, because then your database wouldn’t be scaled-out. So really, you can pick either Availability, or Consistency, or neither; which would mean your database sucks.
The problem is explained very nicely in A plain english introduction to the CAP Theorem. There is something very solid about the CAP theorem. It seems immovable — it’s a theorem, after all.
But what if there is another approach? What if, instead of trying to build a better database, we built something that isn’t a database, and so simply avoids the assumptions of the CAP theorem by stepping outside it? But what would this mean? What is a database, really?
A database, I think, is like a bookshelf. You can PUT a book on the shelf, and you can GET a book off. If someone has a book currently checked-out (write-lock) then you can’t read from it.
So, a distributed database is like a group of libraries with an inter-loan system. Each library has most of the books that the others have, but sometimes a member at one library needs a book that must be borrowed from another library. In another situation, a new edition of a certain book arrives at one library before another.
Suppose the libraries are all on different islands, and books and book requests are carried between the islands by brave librarian-sailors. The librarians are brave, but not fearless, and sometimes the inter-loan service is disrupted because of poor weather. At certain times of the year, disruptions are more frequent.
When you go to your local library to get a book out, there is always a chance that there is a new edition of that book that the library doesn’t even know about yet. You could check out a book, read it, return it, and then later, a new edition arrives at your library, fixing important errors in the edition that you read.
How do you know that you are reading the latest edition of your book? Especially in inclement weather?
You can’t. That is what the CAP theorem teaches us.
What kind of systems can you build if you don’t ever know if your data is true? Well, not systems you can really trust, at least.
So, what do you do if you need to know that the data you got was actually up to date? The best you can do is keep on checking back at the library to see if a new edition has arrived. Naively polling the database — but this multiplies reads and decreases performance!
A database-like phone service which has the CAP problem is described in A plain english introduction to the CAP Theorem. In certain scenarios, the clients of the service make a request, which is answered with what turns out to be out-of-date information. This explanation gives us a clue: the clients call the service on their telephones, but the service does not take advantage of the fact that telephones provide two-way communication.
If the clients had to keep calling back to check if information had changed, it would be very annoying for both the clients and the service. But if the service had the client’s phone numbers, the service could call back the clients when the information they are interested in changes, instead of hoping that the client will call back in time to make use of the new information.
This is my proposal: A “database” with push notifications!
This no longer really fits into the library/bookshelf metaphor, but as you can see, works well with the telephone metaphor. Although a client may temporarily have out-of-date data, they will be back up to date as soon as possible. Humans are good at dealing with changing information — why not computers?
The idea of a database with push notifications is fairly new. CouchDB has a replication feature, and a realtime changes feed — but it’s not the primary way to interact with the database.
We need to start fresh! Or start from simpler building blocks. Enter LevelDB. LevelDB is a lightweight, performant database created by the author of Big Table. It’s used in implementation of IndexedDB in Google Chrome, and has a very interesting node.js binding, LevelUP. Putting these pieces together I am working on a database where the primary way to interact is via registering for push notifications.
This is RumoursDB; currently under rapid development, but available for eager users!