Consistent Hashing is a hash algorithm specifically designed to address the dynamic scaling issues of distributed systems. Its core lies in minimizing the amount of data migration when nodes change. Here are the key points:
1、Core Principles
Virtual Hash Ring
The entire hash value space (e.g., 0 ~ 2³²−1) is connected end-to-end to form a circular structure.
Nodes (such as servers) are mapped to fixed positions on the ring through a hash function.
Data is also mapped to the ring through hashing and is located to the nearest node in the clockwise direction for storage.
Dynamic Scalability
Adding a node: Only affects the data between the new node and its adjacent node in the counter-clockwise direction, and this part of the data needs to be migrated.
Removing a node: Only needs to migrate the data of the failed node to its adjacent node in the clockwise direction.
For example, when the number of nodes changes from N to N+1, only about 1/(N+1) of the data needs to be migrated.
2、Core Advantages
Load Balancing: Data is evenly distributed on the ring, avoiding hot spot issues.
High Fault Tolerance: Node failures only affect local data, and the overall availability of the system is high.
Seamless Expansion: The cost of data migration when adding or removing nodes is significantly lower than that of traditional modulo hashing (e.g., hash(key)%N).
3、Optimization: Virtual Nodes
To solve the load imbalance caused by uneven distribution of physical nodes:
Each physical node is mapped to multiple virtual nodes (e.g., 160-200), which are scattered on the ring.
After data is located to a virtual node, it is then mapped to the actual physical node to ensure more balanced load.
Example: Physical node A is virtualized into A1, A2…, and B is virtualized into B1, B2…, so that data is evenly distributed among virtual nodes.
4、Application Scenarios
Distributed Caching (e.g., Redis Cluster): Reduces data migration during dynamic expansion.
Distributed Storage (e.g., Amazon DynamoDB): Handles node failures and cluster scaling.
Load Balancers: Routes service requests to backend server clusters.
5、Comparison with Traditional Hashing
Scenario | Traditional Modulo Hashing (hash%N) | Consistent Hashing |
Node Expansion/Shrinkage | Almost all data needs to be remapped | Only partial data migration |
Data Distribution | Depends on the number of nodes N | Depends on the hash ring structure |
Avalanche Risk | High (node changes trigger global rehashing) | Low (only affects adjacent data) |
Technical Essence
By mapping nodes and data to a stable topological structure (hash ring), the impact of node changes is localized, enabling high availability and elastic scaling of distributed systems.