Consistent Hashing — Why Distributed Systems Can't Live Without It
The Problem: Adding a Server Breaks Everything
Let's start with the naive approach to distributing data across servers.
You have 4 cache servers and want to distribute keys evenly. The obvious solution:
server = hash(key) % 4
Simple, works great. Until you need to add a 5th server.
Now the formula becomes hash(key) % 5. Run the numbers and you'll find that roughly 80% of all keys now map to a different server. For a cache, this means an immediate 80% cache miss rate — every request suddenly hits your database. At scale, this is catastrophic.
This is the problem consistent hashing was designed to solve.
The Solution: The Hash Ring
Consistent hashing works by imagining a circular ring from 0 to 2³² (about 4 billion positions).
Both servers and keys are mapped to positions on this ring using a hash function.
The rule: A key is assigned to the nearest server clockwise on the ring.
Why This Is Better: Minimal Disruption
Now watch what happens when you add a 5th server:
When you add Server B between Server A and Server C:
- Only the keys that fall between Server A and Server B need to move
- Everything else stays exactly where it is
- With N servers, adding one server remaps only 1/N keys on average
For our 4-server example, adding a 5th server moves only ~20% of keys instead of 80%. That's the magic.
The Virtual Nodes Problem
Basic consistent hashing has a flaw: servers land at random positions on the ring, leading to uneven load distribution.
With 4 servers, you might end up with:
- Server A handling 40% of keys
- Server B handling 10% of keys
- Server C handling 35% of keys
- Server D handling 15% of keys
That's not balanced. Server A is doing 4x the work of Server B.
Virtual Nodes to the Rescue
Instead of placing each server once on the ring, place it multiple times at different positions using different hash functions.
Server A → positions 100, 450, 780, 1200, ...
Server B → positions 200, 520, 890, 1400, ...
Server C → positions 300, 600, 950, 1600, ...
With enough virtual nodes (typically 100-200 per server), load distributes evenly across all physical servers. And when a server is added or removed, the load redistributes evenly too.
Handling Server Failures
When a server goes down in consistent hashing:
Only the keys assigned to the failed server need to be redistributed — to the next server clockwise. All other keys are completely unaffected.
Where Consistent Hashing Is Used in Production
Amazon DynamoDB
DynamoDB uses consistent hashing to distribute data across its storage nodes. Each partition key is hashed to a position on the ring, determining which storage node owns that data. Virtual nodes ensure even distribution as the cluster grows.
Apache Cassandra
Cassandra uses a variant called token-based partitioning — effectively consistent hashing with virtual nodes. Each node is assigned a range of tokens on the ring. Data is replicated to the next N nodes clockwise for fault tolerance.
Content Delivery Networks
CDNs use consistent hashing to route requests to edge servers. A URL always hashes to the same edge server (maximising cache hits). When a server is added or removed, minimal cache invalidation occurs.
Redis Cluster
Redis Cluster uses hash slots — 16,384 slots distributed across nodes using consistent hashing principles. Adding or removing nodes migrates only the affected slots.
Consistent Hashing vs Simple Modulo Hashing
| Modulo Hashing | Consistent Hashing | |
|---|---|---|
| Keys remapped on resize | ~80% | ~1/N |
| Load distribution | Even | Uneven without virtual nodes |
| Complexity | Simple | Moderate |
| Failure impact | All keys affected | Only affected server's keys |
| Used in production | Small systems | DynamoDB, Cassandra, CDNs |
Implementation Sketch
Here's the core idea in TypeScript:
class ConsistentHashRing {
private ring: Map<number, string> = new Map();
private sortedKeys: number[] = [];
private virtualNodes: number = 150;
addServer(server: string): void {
for (let i = 0; i < this.virtualNodes; i++) {
const hash = this.hash(`${server}:${i}`);
this.ring.set(hash, server);
this.sortedKeys.push(hash);
}
this.sortedKeys.sort((a, b) => a - b);
}
getServer(key: string): string {
const hash = this.hash(key);
// Find next position clockwise
const idx = this.sortedKeys.findIndex((k) => k >= hash);
const position = idx === -1 ? 0 : idx;
return this.ring.get(this.sortedKeys[position])!;
}
private hash(key: string): number {
// Simplified - use a proper hash function in production
let hash = 0;
for (const char of key) {
hash = (hash * 31 + char.charCodeAt(0)) % 2 ** 32;
}
return hash;
}
}
Key Takeaways
1. Modulo hashing breaks when you resize. The remapping problem is catastrophic at scale — consistent hashing solves it elegantly.
2. The ring abstraction is powerful. Mapping both servers and keys to the same space makes the math clean and the failure handling natural.
3. Virtual nodes are essential for production. Without them, load distribution is too uneven to be practical. 100-200 virtual nodes per server is the standard.
4. Consistent hashing is everywhere. DynamoDB, Cassandra, Redis Cluster, CDNs — once you understand it, you'll spot it in every large-scale system.
5. Failure handling is elegant. Only the failed server's keys need redistribution. This is a massive advantage over static partitioning schemes.
Understanding consistent hashing at this depth — not just the "what" but the "why" and "how it works under the hood" — is exactly the kind of knowledge that separates Staff Engineers from Senior Engineers in system design interviews.