Chord
The Chord Protocol and Consistent Hashing
Consistent Hashing
See this amazing post for a simple explanation of the Chord System link The two key things to remember about it is
-
Horizontal Scalability - On the addition and/or deletion of nodes, only approx ~k/n keys have to be moved, where k is the number of keys and n is the number of nodes. In the traditional hashing system (modulo n) all would have to be moved
-
Solves non-uniform data distribution - It involves the introduction of a number of replicas or virtual nodes for each server across the ring. Thus giving a chance for a more even distribution of data across nodes
Chord
• Load balance: Chord acts as a distributed hash function,spreading keys evenly over the nodes • Decentralization: Chord is fully distributed: no node is more important than any other. This improves robustness • Scalability: The cost of a Chord lookup grows as the log of the number of nodes, so even very large systems are feasible. • Availability: Chord automatically adjusts its internal tables to reflect newly joined nodes as well as node failures. • Flexible naming: Chord places no constraints on the structure of the keys it looks up: the Chord key-space is flat.
In an N-node network, each node maintains information about only O(log N) other nodes, and a lookup requires O(log N) messages.
Simple Key Look Up
Lets assume that the we use the algorithm successor(x) to find the hash of the node or key and finds the first node that is equal to or greater than the hashed value. Each node need only know how to contact its current successor node on the identifier circle. Queries for a given identifier could be passed around the circle via these successor pointers until they encounter a pair of nodes that straddle the desired identifier.
Scalable Key Location
To accelerate lookups, Chord maintains some additional routing information
- Let m be the number of bits in the key/node identifier. Each node n will then maintain a routing table with upto m entries, called the finger table.
- The i’th entryin n’s table contains the identity of the first node that succeeds n by 2^(i-1), ie s = successor(n+2^(i-1)). We call it the i’th finger of n
- The first finger is the next node (successor(n+1)) as is referred to as the successor
- If id falls between n and its successor, find successor is finished and node n returns its successor
- Otherwise, n searches its finger table for the node n whose ID most immediately precedes id, and then invokes find successor at n0.
- With high probability, the number of nodes that must be contacted to find a successor in an N-node network is O(log N).
Dynamic Operations And Failures
The correctness of the Chord protocol relies on the fact that each node knows its successor
Node Joins and Stabilization
- Join protocol: On joining, a node n will find its successor, by calling p.findSuccessor(n) on any known node p. The network is not made aware of n.
- Stabilization protocol: Runs periodically to ensure that every node’s successors are up to date. Each time a node n runs stabilize, it asks its successor for its predecessor p. If p just joined the system, n will make p its succesor. In addition, it notifies n’s successor of its existence, giving it the chance to update its predecessor (if there’s no closer one)
- Fix Fingers - checkes its finger table entries are correct.
- Check-Predecessor - to check if predecessor has failed and clear pointer if it has
Failure and Replication
To increase robustness, each Chord node maintains a successor list of size r, containing the node’s first r successors. If a node’s immediate successor does not respond, the node can substitute the second entry in its successor list.
Two enhancements can improve Chord performance when nodes leave voluntarily. First, a node n that is about to leave may transfer its keys to its successor before it departs. Second, n may notify its predecessor p and successor s before leaving.