Continuing on with my theme this year of learning the integral parts of High Performing systems, I decided to explore Cassandra, a multi node NoSQL database that acts like a RDBMS. Cassandra is a project that was born out of Facebook and is currently maintained by the Apache Foundation. In this post, we are going to scratch the surface of this tool. This is not an indepth look at the project, but rather a “get it to work” approach.
At its core, Cassandra is designed to feel like an RDBMS but provide the high availability features that NoSQL is known for, attempting to bridge the best of both worlds. The query language used internally is called CQL (Cassandra Query Language). At the most basic level, we have keyspaces (one per node) and within those keyspaces we have tables. The big selling point of Cassandra is its ability to quickly replicate data between its nodes allowing it to automatically fulfill the tenant of Eventual Consistency.
For my purposes I wanted to see how fast I could read and write with a single node using .NET Core. This is mostly to understand the general interactions with the database from .NET. I decided to use Docker to standup the single node cluster and as such generated a Docker compose file so I could run a CQL script at startup to create my key space and table.
The Docker File
I started off trying to use the public cassandra image from Docker Hub but I found that it does not support the entry point concept and required me to create the key space and table myself. Luckily I found the dschroe/cassandra-docker image which extends the public Cassandra image to support this case.
I wanted to just write a bunch of random names to Cassandra so I create a simple keyspace and table. Here is the code: https://gitlab.com/xximjasonxx/cassandra-sandbox/blob/master/schema/init.cql
I worked this into a simple Docker Compose file so that I could do all of the mounting and mapping I needed. You can find the compose file in the GitLab repo I reference at the end. Once you have it, simply run docker-compose up and you should have a Cassandra database with a cluster called names up and running. Its important to inspect the output, you should see the creation of the keyspace and table in the output.
This is the NuGet package I used to handle the connection with Cassandra. At a high level, you need to understand that a cluster can have multiple keyspaces, so you need to specify which one you want to connect to. I found it useful to view the keyspaces as databases since you will see the USE command with them. They are not databases per say, just logical groupings that can have different replication rules.
This connection creates a session which will allow you to manipulate the tables within the keyspace.
Insert the Data
I have always been amused by the funny names Docker will give containers when you dont specify a name. It turns out someone created an endpoint which can return you the names: https://frightanic.com/goodies_content/docker-names.php. This delighted me to no end and so I used this for my data.
You can find the logic which queries this in this file: https://gitlab.com/xximjasonxx/cassandra-sandbox/blob/master/Application.cs
First we want to get the data into Cassandra, I found the best way to do this, especially since I am generating a lot of names is to use the BEGIN and APPLY BATCH wrappers for INSERT commands.
By doing this you can insert however many you like and have little chance of draining the cluster connections (I did this when I did an insert per approach).
Read the Data
When you perform Execute against a Session the result is a RowSet which is enumerable and can be used with LINQ. The interesting thing I found here is that while I specify my column names as firstName and lastName when I get back the row from RowSet the columns are named in all lower case: firstname and lastname. By far this was the most annoying part when I was building this sample.
Delete the Data
I do not know why but Cassandra is very weird about DELETE SQL statements. If I had to venture a guess, its likely restricted due to the replication that needs to happen to finalize the delete operation. It also appears that, if you want to delete, you have to provide a condition, the DELETE FROM <table> to delete everything does not appear to be supported, again I think this is due to the replication consideration.
Instead you have to use TRUNCATE to clear the table.
Cassandra is a big topic and I need more than a partial week to fully understand it but, overall my initial impression is good and I can see its use case. I look forward to using it more and eventually using it in a scenario where its needed.
Here is the code for my current Cassandra test, I recommend having Docker installed as it allows this code to be completely self contained, cheers.