PolarSPARC |
Cassandra Quick Notes :: Part - 2
Bhaskar S | *UPDATED*11/27/2023 |
Overview
In Part-1 of this article series, we performed the necessary setup, started a single-node Apache Cassandra cluster and got our hands dirty with CQL operations.
In this part, we will perform additional setup to activate a multi-node (3 nodes) Apache Cassandra cluster. In addition, we will elaborate more on the concepts from Part-1 and introduce additional concepts on Apache Cassandra.
Additional Setup
To run a multi node Apache Cassandra cluster, we need to download additional configuration files to the directory $CASSANDRA_HOME/etc/cassandra by executing the following commands:
$ cd $CASSANDRA_HOME/etc/cassandra
$ wget https://github.com/apache/cassandra/raw/trunk/conf/jvm-clients.options
$ wget https://github.com/apache/cassandra/raw/trunk/conf/jvm17-clients.options
$ wget https://github.com/apache/cassandra/raw/trunk/conf/logback-tools.xml
$ cd $CASSANDRA_HOME
We will setup additional directory structures by executing the following commands from the users home directory:
$ mkdir -p cassandra/data2
$ mkdir -p cassandra/data3
$ mkdir -p cassandra/logs2
$ mkdir -p cassandra/logs3
To start the second node in the Apache Cassandra cluster, execute the following command:
$ docker run --rm --name cas-node-2 --hostname cas-node-2 --network cassandra-db-net -e CASSANDRA_SEEDS=cas-node-1 -u $(id -u $USER):$(id -g $USER) -v $CASSANDRA_HOME/data2:/var/lib/cassandra/data -v $CASSANDRA_HOME/etc/cassandra:/etc/cassandra -v $CASSANDRA_HOME/logs2:/opt/cassandra/logs cassandra:5.0
The following would be the typical trimmed output that appears in the first node:
... [ SNIP ] ... INFO [GossipStage:1] 2023-11-27 14:26:30,274 Gossiper.java:1460 - Node /172.18.0.3:7000 is now part of the cluster INFO [GossipStage:1] 2023-11-27 14:26:30,277 Gossiper.java:1405 - InetAddress /172.18.0.3:7000 is now UP ... [ SNIP ] ...
Finally, to start the third node in the Apache Cassandra cluster, execute the following command:
$ docker run --rm --name cas-node-3 --hostname cas-node-3 --network cassandra-db-net -e CASSANDRA_SEEDS=cas-node-1 -u $(id -u $USER):$(id -g $USER) -v $CASSANDRA_HOME/data3:/var/lib/cassandra/data -v $CASSANDRA_HOME/etc/cassandra:/etc/cassandra -v $CASSANDRA_HOME/logs3:/opt/cassandra/logs cassandra:5.0
The following would be the typical trimmed output that appears in the first node:
... [ SNIP ] ... INFO [GossipStage:1] 2023-11-27 14:27:39,614 Gossiper.java:1460 - Node /172.18.0.4:7000 is now part of the cluster INFO [GossipStage:1] 2023-11-27 14:27:39,615 Gossiper.java:1405 - InetAddress /172.18.0.4:7000 is now UP ... [ SNIP ] ...
To check the status of the Apache Cassandra cluster, execute the following command:
$ docker exec -it cas-node-1 nodetool status
The following would be the typical output:
Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 172.18.0.3 86.52 KiB 16 59.3% 992f1632-c660-498a-8a02-3a0a1646d28a rack1 UN 172.18.0.2 121.88 KiB 16 64.7% 0073cae6-b42b-4133-b0a5-0a5f7ddb18c1 rack1 UN 172.18.0.4 81.45 KiB 16 76.0% b42f26eb-ec36-41dc-a8e2-504190f59c4e rack1
BINGO - At this point the 3-node Apache Cassandra cluster is ready !!!
Concepts - Level II
The following section elaborates and expands on the concepts on Apache Cassandra:
From earlier, we know that a Column has a name and an associated value. Internally, it also has an associated timestamp. Cassandra uses the timestamp to determine the most recent update to a Column
In Cassandra, the Primary Key (or the Row Key) on a Table enables for efficient lookup of rows. The Primary Key plays a crucial role in distributing the rows of a table across the nodes in a cluster. The first part of the Primary Key is the Partition Key. For a Primary Key with only one column (as was the case with the book_catalog table in Part-1), the Primary Key is the same as the Partition Key. For a Primary Key with more than one column (referred to as the Composite Key), the first part is the Partition Key and the remaining column(s) form the Clustering Key, which is used for sorting (default is ASCENDING order) the data within a partition
A Partitioner determines how the data gets distributed in the cluster. The default Partitioner strategy used in Cassandra is the Murmur3Partitioner which uses the hash of the Partition Key to determine the node
Cassandra also supports indexes on non-primary key Columns. They are referred to as the Secondary Indexes. Under the covers, Cassandra creates a hidden Table for the Secondary Index. Secondary indexes allow for efficient querying of the non-primary key Columns using the Equality operator. Note that the Secondary Indexes are automatically built behind the scenes without blocking any read or write operation(s)
Cassandra is a highly scalable, distributed and decentralized database. By starting more Cassandra node(s), we created a multi-node cluster. All nodes in Cassandra cluster are peers - there is no concept of a master node or slave node(s) unlike relational databases. A client can connect to any of the nodes in the cluster to perform read or write operation(s)
The cluster of node(s) is usually referred to as the Ring in the Cassandra parlance. In other words, think of the nodes in a cluster as being arranged in a logical circle forming a ring
Cassandra uses Gossip communication protocol to discover information about the other nodes in the cluster. The Gossip messages are exchanged every second among the nodes in the cluster. Gossip only communicates the cluster metadata
When Cassandra is started on a new node, it needs to communicate with at least one other node to start the Gossip communication. This is what is called the Seed node. In other words, the purpose of a Seed node is to bootstrap the Gossip process for any new node(s) joining the cluster. One or more nodes in the cluster can be identified as the Seed nodes
Cassandra provides fault tolerance by storing copies of a row (replicas) on multiple node(s) in the cluster. The number of replicas in the cluster is known as the Replication Factor. A replication factor of 1 means there is only one copy of a row in one node. A replication factor of 2 means there are two copies of a row on two different nodes of the cluster
A Snitch in Cassandra just reports a node's rack and datacenter. It determines the topology of a cluster, that is, which nodes belong where in the cluster. The default Snitch strategy used in Cassandra is the SimpleSnitch which places all nodes in the same datacenter and rack
The following are 3 important properties in any scalable, fault-tolerant distributed system:
Consistency - Clients connecting to different nodes in a cluster will read the same set of values for a given query even when there are writes being performed in parallel
Availability - At least one node in a cluster is up and running so clients can perform read or write operations
Partition Tolerance - Clients are able to perform read or write operations even when a network issue divides a cluster of nodes into disjoint groups
Brewer's CAP (Consistency, Availability, Partition Tolerance) Theorem states that one can achieve only 2 of the 3 properties with acceptable performance in a distributed system
Cassandra chooses to guarantee AP (Availability and Partition Tolerance) and relaxes on Consistency resulting in what is termed Eventual Consistency
Cassandra enhances Eventual Consistency by offering Tunable Consistency in which a client can indicate the level of consistency with each read or write operation
In Cassandra, when a write operation is performed, the row data is first appended to a disk file called the commitlog for durability. Then, it is written to an in-memory structure called the memtable. A write operation is considered successful only after writes to both the commitlog and the memtable have completed without errors. This results in minimal disk IO and optimal write operation
Data in the memtable is stored in a sorted order by the row key
Periodically data in the memtable are flushed to a persistent immutable disk file called the SSTable (short for the Sorted String Table). This disk IO operation is performed in the background for optimal performance.
Since an SSTable is immutable disk based file structure (no in place update unlike relational databases), data for a row key can be spread across several SSTable files
In Cassandra, when a read operation is performed for a given row key, the row data must be gathered from the memtable as well as all SSTable files on the node that contain columns for the given row key. To optimize this data gathering process, Cassandra uses an in-memory structure called the Bloom Filter. Each SSTable has an associated Bloom Filter which allows for checks to see if a requested row key exists in the SSTable before performing any disk IO to fetch the actual data
Hands-on with Cassandra - II
To launch the CQL command-line interface, execute the following command:
$ docker run -it --rm --name cas-client --network cassandra-db-net cassandra:5.0 cqlsh cas-node-1
The following will be the output:
WARNING: cqlsh was built against 5.0-alpha2, but this server is 5.0. All features may not work! Connected to Cassandra Cluster at cas-node-1:9042 [cqlsh 6.2.0 | Cassandra 5.0-alpha2 | CQL spec 3.4.7 | Native protocol v5] Use HELP for help. cqlsh>
On success, CQL will change the command prompt to "cqlsh>".
To create a Keyspace called mytestks2, input the following command at the "cqlsh>" prompt:
cqlsh> CREATE KEYSPACE IF NOT EXISTS mytestks2 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2};
There will be no output.
Notice that we have now specified a Replication Factor of 2 given we have a 3-node cluster.
To use the Keyspace called mytestks2, input the following command at the "cqlsh>" prompt:
cqlsh> USE mytestks2;
There will be no output and the input prompt would change to "cqlsh:mytestks2>".
To display information about any node in the cluster (for example cas-node-2), execute the following command:
$ docker exec -it cas-node-2 nodetool info -T
The following would be the typical output:
ID : cf8c8360-eeea-49ea-9fa0-6b417d30d98f Gossip active : true Native Transport active: true Load : 169.54 KiB Uncompressed load : 315.92 KiB Generation No : 1701112276 Uptime (seconds) : 8947 Heap Memory (MB) : 504.97 / 8192.00 Off Heap Memory (MB) : 0.00 Data Center : datacenter1 Rack : rack1 Exceptions : 0 Key Cache : entries 11, size 984 bytes, capacity 100 MiB, 101 hits, 119 requests, 0.849 recent hit rate, 14400 save period in seconds Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds Network Cache : size 8 MiB, overflow size: 0 bytes, capacity 128 MiB Percent Repaired : 100.0% Token : -8225213037676917188 Token : -7289299864885401768 Token : -6873662205313761888 Token : -5661981858641780742 Token : -5264659006293067951 Token : -4006238290347624863 Token : -2971937643672220497 Token : -2545651142391091107 Token : -1195449315816413223 Token : -138072798887455006 Token : 1983622101114780101 Token : 3242042692258433134 Token : 4306383682031496531 Token : 6126387983011442930 Token : 8003452312524428959 Token : 9172407235703443338 Bootstrap state : COMPLETED Bootstrap failed : false Decommissioning : false Decommission failed : false
To create a Table called address_book that consists of composite Primary Key (with a Partiton Key and Clustering Keys ), input the following command at the "cqlsh:mytestks2>" prompt:
cqlsh:mytestks2> CREATE TABLE IF NOT EXISTS address_book (email_id text, first_name text, last_name text, state text, zip text, PRIMARY KEY ((email_id), state, zip));
There will be no output.
To check the various settings on the table address_book, input the following command at the "cqlsh:mytestks2>" prompt:
cqlsh:mytestks2> DESCRIBE TABLE address_book;
The following will be the output:
CREATE TABLE mytestks2.address_book ( email_id text, state text, zip text, first_name text, last_name text, PRIMARY KEY (email_id, state, zip) ) WITH CLUSTERING ORDER BY (state ASC, zip ASC) AND additional_write_policy = '99p' AND allow_auto_snapshot = true AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND cdc = false AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND memtable = 'default' AND crc_check_chance = 1.0 AND default_time_to_live = 0 AND extensions = {} AND gc_grace_seconds = 864000 AND incremental_backups = true AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair = 'BLOCKING' AND speculative_retry = '99p';
For the table address_book, the column email_id will be the Partiton Key, while the columns state and zip will be the Clustering Keys.
To display the Gossip information for the cluster, execute the following command:
$ docker exec -it cas-node-1 nodetool gossipinfo
The following would be the typical output:
/172.18.0.2 generation:1701112194 heartbeat:19309 STATUS:72:NORMAL,-1847922227964662486 LOAD:19252:231902.0 SCHEMA:10968:00313073-6c1f-356b-917c-525d4a726c36 DC:8:datacenter1 RACK:10:rack1 RELEASE_VERSION:5:5.0-alpha2 RPC_ADDRESS:4:172.18.0.2 NET_VERSION:1:12 HOST_ID:2:f142d327-4c51-43ea-8fa1-a5887c5359f4 RPC_READY:82:true NATIVE_ADDRESS_AND_PORT:3:172.18.0.2:9042 STATUS_WITH_PORT:71:NORMAL,-1847922227964662486 SSTABLE_VERSIONS:6:big-nc TOKENS:70:<hidden> /172.18.0.3 generation:1701112276 heartbeat:19226 LOAD:19192:230385.0 SCHEMA:10886:00313073-6c1f-356b-917c-525d4a726c36 DC:8:datacenter1 RACK:10:rack1 RELEASE_VERSION:5:5.0-alpha2 NET_VERSION:1:12 HOST_ID:2:cf8c8360-eeea-49ea-9fa0-6b417d30d98f RPC_READY:97:true NATIVE_ADDRESS_AND_PORT:3:172.18.0.3:9042 STATUS_WITH_PORT:84:NORMAL,-1195449315816413223 SSTABLE_VERSIONS:6:big-nc TOKENS:83:<hidden> /172.18.0.4 generation:1701112367 heartbeat:19134 LOAD:19129:236722.0 SCHEMA:10791:00313073-6c1f-356b-917c-525d4a726c36 DC:8:datacenter1 RACK:10:rack1 RELEASE_VERSION:5:5.0-alpha2 NET_VERSION:1:12 HOST_ID:2:1072dc0b-9089-421c-8bf6-dc35f4a1a495 RPC_READY:96:true NATIVE_ADDRESS_AND_PORT:3:172.18.0.4:9042 STATUS_WITH_PORT:83:NORMAL,-1447708912539482187 SSTABLE_VERSIONS:6:big-nc TOKENS:82:<hidden>
To insert three rows into the table address_book, input the following commands at the "cqlsh:mytestks2>" prompt:
cqlsh:mytestks2> INSERT INTO address_book (email_id, first_name, last_name, state, zip) VALUES ('alice@builder.com', 'alice', 'builder', 'NY', '10001');
cqlsh:mytestks2> INSERT INTO address_book (email_id, first_name, state, zip) VALUES ('bob@coder.com', 'bob', 'NJ', '08550');
cqlsh:mytestks2> INSERT INTO address_book (email_id, first_name, last_name, state, zip) VALUES ('charlie@dancer.com', 'charlie', 'dancer', 'NY', '10012');
There will be no output.
To select the values of the email_id, the first_name, and the latest update timestamp (column metadata) of first_name for all the rows from the table address_book, input the following command at the "cqlsh:mytestks2>" prompt:
cqlsh:mytestks2> SELECT email_id, first_name, writetime(first_name) FROM address_book;
The following will be the output:
email_id | first_name | writetime(first_name) --------------------+------------+----------------------- alice@builder.com | alice | 1701131587563197 bob@coder.com | bob | 1701131597661015 charlie@dancer.com | charlie | 1701131606944917 (3 rows)
To select the values of the email_id and its corresponding token generated by the default Murmur3Partitioner for all the rows from the table address_book, input the following command at the "cqlsh:mytestks2>" prompt:
cqlsh:mytestks2> SELECT email_id, token(email_id) FROM address_book;
The following will be the output:
email_id | system.token(email_id) --------------------+------------------------ alice@builder.com | -2094522160347831912 bob@coder.com | -1716231266002284945 charlie@dancer.com | 1913932259381705508 (3 rows)
To check which nodes in the cluster stored the row for alice@builder.com, execute the following command:
$ docker exec -it cas-node-1 nodetool getendpoints mytestks2 address_book "alice@builder.com"
The following will be the output:
172.18.0.3 172.18.0.4
Next, to check which nodes in the cluster stored the row for bob@coder.com, execute the following command:
$ docker exec -it cas-node-1 nodetool getendpoints mytestks2 address_book "bob@coder.com"
The following will be the output:
172.18.0.4 172.18.0.3
Finally, to check which nodes in the cluster stored the row for charlie@dancer.com, execute the following command:
$ docker exec -it cas-node-1 nodetool getendpoints mytestks2 address_book "charlie@dancer.com"
The following will be the output:
172.18.0.3 172.18.0.4
To drop the entire table address_book, input the following command at the " cqlsh:mytestks2>" prompt:
cqlsh:mytestks2> DROP TABLE address_book;
There will be no output.
To drop the entire keyspace mytestks2, input the following command at the " cqlsh:mytestks2>" prompt:
cqlsh:mytestks2> DROP KEYSPACE mytestks2;
There will be no output.
To exit the CQL command-line interface, input the following command at the " cqlsh:mytestks2>" prompt:
cqlsh:mytestks2> exit;
There will be no output.
This concludes the hands-on demonstration for this part in setting and using a 3-node Apache Cassandra cluster !!!
References