System DesignDistributed SystemsArchitecture

Consistent Hashing — Why Distributed Systems Can't Live Without It

Sumit Yadav

April 18, 20267 min read

The Problem: Adding a Server Breaks Everything

Let's start with the naive approach to distributing data across servers.

You have 4 cache servers and want to distribute keys evenly. The obvious solution:

server = hash(key) % 4

Simple, works great. Until you need to add a 5th server.

Now the formula becomes hash(key) % 5. Run the numbers and you'll find that roughly 80% of all keys now map to a different server. For a cache, this means an immediate 80% cache miss rate — every request suddenly hits your database. At scale, this is catastrophic.

This is the problem consistent hashing was designed to solve.

The Solution: The Hash Ring

Consistent hashing works by imagining a circular ring from 0 to 2³² (about 4 billion positions).

Both servers and keys are mapped to positions on this ring using a hash function.

The rule: A key is assigned to the nearest server clockwise on the ring.

Why This Is Better: Minimal Disruption

Now watch what happens when you add a 5th server:

When you add Server B between Server A and Server C:

Only the keys that fall between Server A and Server B need to move
Everything else stays exactly where it is
With N servers, adding one server remaps only 1/N keys on average

For our 4-server example, adding a 5th server moves only ~20% of keys instead of 80%. That's the magic.

The Virtual Nodes Problem

Basic consistent hashing has a flaw: servers land at random positions on the ring, leading to uneven load distribution.

With 4 servers, you might end up with:

Server A handling 40% of keys
Server B handling 10% of keys
Server C handling 35% of keys
Server D handling 15% of keys

That's not balanced. Server A is doing 4x the work of Server B.

Virtual Nodes to the Rescue

Instead of placing each server once on the ring, place it multiple times at different positions using different hash functions.

Server A → positions 100, 450, 780, 1200, ...
Server B → positions 200, 520, 890, 1400, ...
Server C → positions 300, 600, 950, 1600, ...

With enough virtual nodes (typically 100-200 per server), load distributes evenly across all physical servers. And when a server is added or removed, the load redistributes evenly too.

Handling Server Failures

When a server goes down in consistent hashing:

Only the keys assigned to the failed server need to be redistributed — to the next server clockwise. All other keys are completely unaffected.

Where Consistent Hashing Is Used in Production

Amazon DynamoDB

DynamoDB uses consistent hashing to distribute data across its storage nodes. Each partition key is hashed to a position on the ring, determining which storage node owns that data. Virtual nodes ensure even distribution as the cluster grows.

Apache Cassandra

Cassandra uses a variant called token-based partitioning — effectively consistent hashing with virtual nodes. Each node is assigned a range of tokens on the ring. Data is replicated to the next N nodes clockwise for fault tolerance.

Content Delivery Networks

CDNs use consistent hashing to route requests to edge servers. A URL always hashes to the same edge server (maximising cache hits). When a server is added or removed, minimal cache invalidation occurs.

Redis Cluster

Redis Cluster uses hash slots — 16,384 slots distributed across nodes using consistent hashing principles. Adding or removing nodes migrates only the affected slots.

Consistent Hashing vs Simple Modulo Hashing

	Modulo Hashing	Consistent Hashing
Keys remapped on resize	~80%	~1/N
Load distribution	Even	Uneven without virtual nodes
Complexity	Simple	Moderate
Failure impact	All keys affected	Only affected server's keys
Used in production	Small systems	DynamoDB, Cassandra, CDNs

Implementation Sketch

Here's the core idea in TypeScript:

class ConsistentHashRing {
  private ring: Map<number, string> = new Map();
  private sortedKeys: number[] = [];
  private virtualNodes: number = 150;

  addServer(server: string): void {
    for (let i = 0; i < this.virtualNodes; i++) {
      const hash = this.hash(`${server}:${i}`);
      this.ring.set(hash, server);
      this.sortedKeys.push(hash);
    }
    this.sortedKeys.sort((a, b) => a - b);
  }

  getServer(key: string): string {
    const hash = this.hash(key);
    // Find next position clockwise
    const idx = this.sortedKeys.findIndex((k) => k >= hash);
    const position = idx === -1 ? 0 : idx;
    return this.ring.get(this.sortedKeys[position])!;
  }

  private hash(key: string): number {
    // Simplified - use a proper hash function in production
    let hash = 0;
    for (const char of key) {
      hash = (hash * 31 + char.charCodeAt(0)) % 2 ** 32;
    }
    return hash;
  }
}

Key Takeaways

1. Modulo hashing breaks when you resize. The remapping problem is catastrophic at scale — consistent hashing solves it elegantly.

2. The ring abstraction is powerful. Mapping both servers and keys to the same space makes the math clean and the failure handling natural.

3. Virtual nodes are essential for production. Without them, load distribution is too uneven to be practical. 100-200 virtual nodes per server is the standard.

4. Consistent hashing is everywhere. DynamoDB, Cassandra, Redis Cluster, CDNs — once you understand it, you'll spot it in every large-scale system.

5. Failure handling is elegant. Only the failed server's keys need redistribution. This is a massive advantage over static partitioning schemes.

Understanding consistent hashing at this depth — not just the "what" but the "why" and "how it works under the hood" — is exactly the kind of knowledge that separates Staff Engineers from Senior Engineers in system design interviews.

← More Articles Connect on LinkedIn →