Cassandra Inter Node Communication, seed nodes and Failure detection and recovery

To prevent failure in case any node in the cluster went down and to ensure continuous availability, Cassandra nodes keeps on updating their states to each other using Gossip protocol.

Inter node communication and seed nodes

Gossip is a peer-to-peer communication protocol that helps all the nodes to communicate and exchange state information of them and other another nodes they know about. This communication keeps on running every second and passed state inflammation to up to three nodes in the cluster. This process of exchanging state data of their own and other nodes they know about keeps all the nodes up to date.

Most critical situation arise when a node starts up, it has to remember previous data about what node it has communicate with in subsequent restarts, by default all nodes does that. Any node in the cluster can be given seed node designation and it has nothing to do with anything else but updating a node when it joins the cluster.

One should have single seed node list on all the nodes, so that minimal seed nodes can solve the purpose. In a multi data center cluster seed node list should contain at least one node from every data center, otherwise gossip has to communicate to a seed node at other data center when a new node joins cluster. One should not made every node a seed node, it will decrease the gossip performance and hard to maintain too. Gossip optimization is not critical though it should be taken care to have 3 nodes as seed nodes per cluster.

Failure detection and recovery

Same gossip communication and state information transfer between node is used to detect which node is up and down, hence user request can be prevented to go for a dead node.

Failure detection

Failure detection takes place using direct gossip and nodes communicated about secondhand, thirdhand, and so on. For failure detection Cassandra also takes network, workload and historical state of a node into account. During gossip exchanges, every node maintains a sliding window of inter-arrival times of gossip messages from other nodes in the cluster. One can also manage failure detector sensitivity using phi_convict_threshold in cassandra.yaml file.

When a node is marked down

One node can be marked as down if the hardware is down or a network communication fails, in this case all other nodes will continuously try to gossip with down node to check of the node is back or not. To remove a node permanently form the cluster administrators must explicitly add or remove nodes from a Cassandra cluster using the nodetool utility or OpsCenter.

When a down node comes back

When a node comes back on line after a down period, it may have missed write longs during the period. Once a node is marked down all write node are saved on replica node for a period of time bydefault3 hour. Once 3 hour time is over missed write logs are deleted from down node replicas and one has to issue repair after the down node is up and running. As a good practice one should have issue a regular repair on cluster to maintain constant data even if now node is down.

So far in this series of Apache Cassandra Articles we have seen, What is Cassandra ? How does Cassandra Work its Data Model and other architectural information. In coming articles we will how to inistall a single node and multinode cluster in Cassandra and CQL commands.