In my last MySQL vs article, I talked about Redis, which was a database I hadn’t heard about before. This time, I wanted to talk about a database that I’ve heard of but never gotten the time to try out. Cassandra DB is another NoSQL database that I haven’t had the opportunity to try but often heard about it at events like Hackathons.
Other than being a NoSQL database, Cassandra is also open source, which follows what I typically choose to write articles about. But other than that, I don’t have any other background knowledge about it.
So, for another MySQL vs article, let’s look more into what Cassandra DB is. Like we did with the previous articles, it will mainly be administrative and will not include the syntax. But we’ll also look at some of the pros and cons for Cassandra.
We can also compare a few pros and cons for MySQL, but they will relate to what we have learned about Cassandra directly, as we’ve already talked a lot about MySQL in this series. So, without any further delay, let’s learn what Cassandra DB is.
Table of Contents
What is Cassandra
Cassandra is a database from Apache that is open source. It is NoSQL, and so also lightweight. Cassandra is also a distributed database. A distributed database runs on multiple machines, but to the users, it would look like only one because they act as a unified whole.
This happens through multiple nodes, which each represent an instance of Cassandra. From there, the nodes communicate with each other to distribute the workload for improved functionality. If this node logic sounds familiar, it’s because Cassandra’s designed to be easily organized into a cluster. From there, you can have multiple data centers if you so choose.
Cassandra is also flexible in its scalability. Because Cassandra is so dynamic, you can grow or shrink the database as you need. But this isn’t like MySQL, where there is heavy downtime ultimately to hit a ceiling again.
Instead, Cassandra allows more on-the-fly expansion which just means all you need to do is add more nodes to increase the size, capacity, or even CPU power or RAM associated. This means very little-to-no downtime is required, and if you go overboard you can scale back just as easily.
As we’ve talked about in the past, MySQL and Cassandra are both open source. With MySQL, a few articles ago we talked about the proprietary software that MySQL offers. That of course is a paid service with additional support and capabilities.
With Cassandra, I found information on their open-source documentation, but I couldn’t find anything about it having paid features or proprietary code. If this is incorrect please let me know in the comments, but from what I’m seeing, Cassandra is true open source.
So, first, let’s talk about the structure of the databases. MySQL, as you know, is an RDBMS (Relational Database Management System). Cassandra, however, is a NoSQL database. This means that MySQL will follow more of a master/worker architecture, while Cassandra follows peer-to-peer architecture.
We already know that MySQL supports ACID (Atomicity, Consistency, Reliability, and Durability) transactions. Cassandra, however, does not automatically follow ACID transactions. This does not mean it isn’t possible.
Although not initially provided, you can tune Cassandra’s features to support ACID properties. For example, tuning Cassandra’s replication and fault tolerance ensures reliability. Another example is in tuning the consistency. Cassandra is an AP (Available Partition-tolerant) database, but you can configure the consistency to be on a per-query basis.
When we’re looking at scalability, MySQL more commonly supports vertical scaling. Horizontal scaling is also possible through replication or sharding. Cassandra, on the other hand, supports both horizontal and vertical scalability.
Although this one is a little more specific, let’s also look at the performance of Read transactions. But first, we need to look at JOINs to understand the logic. As you know well by now, MySQL, or any RDBMS for that matter, supports JOINs between multiple tables in a query.
Cassandra, on the other hand, discourages JOINs. Instead, it prefers to SELECT from only one table per query. So, because multiple tables can be joined in a MySQL Read, the performance would be O(log(n)).
With only one table being read at a time, Cassandra’s performance is O(1). When looking at Write statements, MySQL’s performance can be slowed because a search is being performed before the write. Instead of a search, Cassandra uses the append model, which providers higher performance when writing.
Maybe a given, but MySQL, because it is an RDBMS, supports Referential Integrity, and has Foreign keys. Because Cassandra is a NoSQL database, it does not enforce Referential Integrity and therefore does not have Foreign Keys.
To ensure consistency within a distributed system, MySQL provides the Immediate Consistency method, but it is the only type provided. Cassandra allows both Immediate Consistency methods and Eventual Consistency methods.
As far as operating systems go, MySQL is used on FreeBSD, Linux, OS X, Solaris, and Windows. Cassandra, however, is only supported on BSD, Linux, OS X, and Windows. MySQL, as we have learned, was also written in C and C++ languages. Cassandra, on the other hand, was written only in Java. MySQL was also developed by Oracle, which Cassandra was developed by Apache Software.
Advantages of Cassandra
As we talked about while describing Cassandra, its scalability is a great advantage. This is because it can be done quickly with no downtime, as you do not have to shut the database down to scale. Both horizontal and vertical scalability is an option, as Cassandra uses a linear model for faster responses.
Along with scalability, the data storage is flexible. Because it is a NoSQL database, it can deal with structured, unstructured, or semi-structured data. In the same way, the data distribution is flexible. Several different data centers can be used, which makes it easier to distribute the data.
Performance was another factor we discussed. The benefit we’ll talk about here is how it handles simultaneous read and write statements. Even multiple write requests are handled quickly and do not affect the read requests.
Another benefit to using Cassandra is the easy language, CQL (Cassandra Query Language), which is offered as an alternative SQL. Cassandra also has the benefit of decentralization. This means that because of the structure of the nodes, there would be no single point of failure. If a node was to fail, another node could retrieve the same data, and therefore the data would still be available.
Disadvantages of Cassandra
One disadvantage of Cassandra is that because it is NoSQL, there is no structured SQL syntax, so there would be a list of features Cassandra doesn’t have. For example, there is no enforcement of Referential integrity, subqueries (GROUP BY, ORDER BY, etc.), or even JOINs.
Cassandra has limited querying capabilities, and aggregates are also not supported. In addition, the Read requests can run slowly. We mentioned that Write requests can run quickly, but multiple Read requests can delay results and run more slowly.
The data is modeled in Cassandra using predicted queries. This means that there is the potential for duplicate data. Especially with Cassandra being NoSQL, you may have to deal with duplicate data, as it will not be automatically rejected like MySQL or other SQL languages would.
Advantages of MySQL when compared to Cassandra
The largest advantage of MySQL, when compared to Cassandra, is the fact that it is an RDBMS. Primarily, we are talking about JOINs, aggregates, and other functionality such as enforcing Referential Integrity.
There is also more flexible querying, and you can create any combination to yield different results. However, MySQL is more flexible than some other SQL systems, so there is limited compliance with SQL standards.
MySQL also tries to prevent duplicate records, which Cassandra does not. Not only this, but MySQL is also ACID compliant, which may be that extra structure you need for your database.
Disadvantages of MySQL
The first major disadvantage, when compared to Cassandra, is the flexibility when scaling. Although MySQL can scale, there is downtime associated, and even then, you can still hit ceilings. However, the query speed can also be slowed because of all potential combinations in the queries. For example, if you’re joining multiple tables, that could slow your results, whether it is a Read or a Write request.
MySQL is also only partially open source. There is proprietary code involved as well as additional support when using the paid version.
In today’s MySQL versus series, we looked at Cassandra, a NoSQL database. Cassandra is another lightweight, open-source, and highly scalable database that has been gaining popularity. It is an Apache software designed to run as a distributed database amongst a series of nodes. This means there can be multiple data centers, and if you want to scale you simply add or remove nodes. Although Cassandra is flexible and useful, it also does not follow standard SQL practices, such as enforcing Referential Integrity, and it encourages the user to write separate queries instead of JOINing, which is not supported.
If you know any more about Cassandra or want to share your experiences, please feel free to leave a comment. I hope you found this information useful, or that it was at least an interesting read. Until next time, cheers!