What is Consistent Hashing?

What is Consistent Hashing?

Consistent Hashing is a hash algorithm specifically designed to address the dynamic scaling issues of distributed systems. Its core lies in minimizing the amount of data migration when nodes change. Here are the key points:

1、Core Principles

Virtual Hash Ring

The entire hash value space (e.g., 0 ~ 2³²−1) is connected end-to-end to form a circular structure.

Nodes (such as servers) are mapped to fixed positions on the ring through a hash function.

Data is also mapped to the ring through hashing and is located to the nearest node in the clockwise direction for storage.

Dynamic Scalability

Adding a node: Only affects the data between the new node and its adjacent node in the counter-clockwise direction, and this part of the data needs to be migrated.

Removing a node: Only needs to migrate the data of the failed node to its adjacent node in the clockwise direction.

For example, when the number of nodes changes from N to N+1, only about 1/(N+1) of the data needs to be migrated.

2、Core Advantages

Load Balancing: Data is evenly distributed on the ring, avoiding hot spot issues.

High Fault Tolerance: Node failures only affect local data, and the overall availability of the system is high.

Seamless Expansion: The cost of data migration when adding or removing nodes is significantly lower than that of traditional modulo hashing (e.g., hash(key)%N).

3、Optimization: Virtual Nodes

To solve the load imbalance caused by uneven distribution of physical nodes:

Each physical node is mapped to multiple virtual nodes (e.g., 160-200), which are scattered on the ring.

After data is located to a virtual node, it is then mapped to the actual physical node to ensure more balanced load.

Example: Physical node A is virtualized into A1, A2…, and B is virtualized into B1, B2…, so that data is evenly distributed among virtual nodes.

4、Application Scenarios

Distributed Caching (e.g., Redis Cluster): Reduces data migration during dynamic expansion.

Distributed Storage (e.g., Amazon DynamoDB): Handles node failures and cluster scaling.

Load Balancers: Routes service requests to backend server clusters.

5、Comparison with Traditional Hashing

ScenarioTraditional Modulo Hashing (hash%N)Consistent Hashing
Node Expansion/ShrinkageAlmost all data needs to be remappedOnly partial data migration
Data DistributionDepends on the number of nodes NDepends on the hash ring structure
Avalanche RiskHigh (node changes trigger global rehashing)Low (only affects adjacent data)

Technical Essence

By mapping nodes and data to a stable topological structure (hash ring), the impact of node changes is localized, enabling high availability and elastic scaling of distributed systems.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *