How Discord Stores Trillions of Messages with High Performance

How Discord Stores Trillions of Messages with High PerformanceWe will walk through how Discord rebuilt its message storage layer from the ground up. We will learn the issues Discord faced with Apache Cassandra® and their shift to ScyllaDB. 
͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     
Forwarded this email? Subscribe here for more
How Discord Stores Trillions of Messages with High Performance
ByteByteGo
Jul 8 

READ IN APP

MCP Authorization in 5 Easy OAuth Specs (Sponsored)
Securely authorizing access to an MCP server used to be an open question. Now there's a clear answer: OAuth. It provides a path with five key specs covering delegation, token exchange, and scoped access.

WorkOS packages the full stack into one API, so you can add MCP authorization without building your own OAuth infrastructure.
Implement MCP Auth with WorkOS
Disclaimer: The details in this post have been derived from the articles shared online by the Discord Engineering Team. All credit for the technical details goes to the Discord Engineering Team.  The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.
Many chat platforms never reach the scale where they have to deal with trillions of messages. However, Discord does. And when that happens, a somewhat manageable data problem can quickly turn into a major engineering challenge that involves millions of users sending messages across millions of channels. 
At this scale, even the smallest architectural choices can have a big impact. Things like hot partitions can turn into support nightmares. Garbage collection pauses aren’t just annoying, but can lead to system-wide latency spikes. The wrong database design can lead to wastage of developer time and operational bandwidth.
Discord’s early database solution (moving from MongoDB to Apache Cassandra®) promised horizontal scalability and fault tolerance. It delivered both, but at a significant operational cost. Over time, keeping Apache Cassandra® stable required constant firefighting, careful compaction strategies, and JVM tuning. Eventually, the database meant to scale with Discord had become a bottleneck.
In this article, we will walk through how Discord rebuilt its message storage layer from the ground up. We will learn the issues Discord faced with Apache Cassandra® and their shift to ScyllaDB. Also, we will look at the introduction of Rust-based data services to shield the database from overload and improve concurrency handling.
Go from Engineering to AI Product Leadership (Sponsored)
As an engineer or tech lead, you know how to build complex systems. But how do you translate that technical expertise into shipping world-class AI products? The skills that define great AI product leaders—from ideation and data strategy to managing LLM-powered roadmaps—are a different discipline.
This certification is designed for technical professionals. Learn directly from Miqdad Jaffer, Product Leader at OpenAI, in the #1 rated AI certificate on Maven. You won't just learn theory; you will get hands-on experience developing a capstone project and mastering the frameworks used to build and scale products in the real world.
Exclusive for ByteByteGo Readers: Use code BBG500 to save $500 before the next cohort sells out.
View Course & Save $500
Initial Architecture
Discord's early message storage relied on Apache Cassandra®. The schema grouped messages by channel_id and a bucket, which represented a static time window. 
This schema allowed for efficient lookups of recent messages in a channel, and Snowflake IDs provided natural chronological ordering. A replication factor of 3 ensured each partition existed on three separate nodes for fault tolerance.
Source: Discord Engineering Blog
Within each partition, messages were sorted in descending order by message_id, a Snowflake-based 64-bit integer that encoded creation time.
The diagram below shows the overall partitioning strategy based on the channel ID and bucket.
At a small scale, this design worked well. However, scale often introduces problems that don't show up in normal situations.
Apache Cassandra® favors write-heavy workloads, which aligns well with chat systems. However,  high-traffic channels with massive user bases can generate orders of magnitude more messages than quiet ones. 
A few things started to go wrong at this point:
Hot partitions emerged when a popular channel received a surge of reads or writes. Since Apache Cassandra® was configured to partition data by channel_id and bucket, one node can potentially get overloaded trying to serve queries for that partition. These hot spots increased latency not just locally but across the cluster.
Reads had to check in-memory memtables and potentially scan across multiple on-disk SSTables. As data grew and compaction lagged, reads became slower and more expensive.
The diagram below shows the concept of hot partitions: