Chord

The Chord Protocol and Consistent Hashing

Consistent Hashing

See this amazing post for a simple explanation of the Chord System link The two key things to remember about it is

Horizontal Scalability - On the addition and/or deletion of nodes, only approx ~k/n keys have to be moved, where k is the number of keys and n is the number of nodes. In the traditional hashing system (modulo n) all would have to be moved
Solves non-uniform data distribution - It involves the introduction of a number of replicas or virtual nodes for each server across the ring. Thus giving a chance for a more even distribution of data across nodes

Chord

• Load balance: Chord acts as a distributed hash function,spreading keys evenly over the nodes • Decentralization: Chord is fully distributed: no node is more important than any other. This improves robustness • Scalability: The cost of a Chord lookup grows as the log of the number of nodes, so even very large systems are feasible. • Availability: Chord automatically adjusts its internal tables to reflect newly joined nodes as well as node failures. • Flexible naming: Chord places no constraints on the structure of the keys it looks up: the Chord key-space is flat.

In an N-node network, each node maintains information about only O(log N) other nodes, and a lookup requires O(log N) messages.

Simple Key Look Up

Lets assume that the we use the algorithm successor(x) to find the hash of the node or key and finds the first node that is equal to or greater than the hashed value. Each node need only know how to contact its current successor node on the identifier circle. Queries for a given identifier could be passed around the circle via these successor pointers until they encounter a pair of nodes that straddle the desired identifier.

Scalable Key Location

To accelerate lookups, Chord maintains some additional routing information

Let m be the number of bits in the key/node identifier. Each node n will then maintain a routing table with upto m entries, called the finger table.
The i’th entryin n’s table contains the identity of the first node that succeeds n by 2^(i-1), ie s = successor(n+2^(i-1)). We call it the i’th finger of n
The first finger is the next node (successor(n+1)) as is referred to as the successor
If id falls between n and its successor, find successor is finished and node n returns its successor
Otherwise, n searches its finger table for the node n whose ID most immediately precedes id, and then invokes find successor at n0.
With high probability, the number of nodes that must be contacted to find a successor in an N-node network is O(log N).

Dynamic Operations And Failures

The correctness of the Chord protocol relies on the fact that each node knows its successor

Node Joins and Stabilization

Join protocol: On joining, a node n will find its successor, by calling p.findSuccessor(n) on any known node p. The network is not made aware of n.
Stabilization protocol: Runs periodically to ensure that every node’s successors are up to date. Each time a node n runs stabilize, it asks its successor for its predecessor p. If p just joined the system, n will make p its succesor. In addition, it notifies n’s successor of its existence, giving it the chance to update its predecessor (if there’s no closer one)
Fix Fingers - checkes its finger table entries are correct.
Check-Predecessor - to check if predecessor has failed and clear pointer if it has

Failure and Replication

To increase robustness, each Chord node maintains a successor list of size r, containing the node’s first r successors. If a node’s immediate successor does not respond, the node can substitute the second entry in its successor list.

Two enhancements can improve Chord performance when nodes leave voluntarily. First, a node n that is about to leave may transfer its keys to its successor before it departs. Second, n may notify its predecessor p and successor s before leaving.

Written on September 20, 2019