Unwrap Unbeatable Holiday Deals with Verizon (Sponsored)Reliability shouldn’t cost extra—and Verizon proves it this holiday season. Switch to Verizon and get four lines on Unlimited Welcome for $25 per line/month (with Auto Pay, plus taxes and fees) and everyone gets one of the hottest devices, all on them. No trade-in required. Devices include: Everyone gets a better deal—flexibility, savings, and support with no extra cost. Explore Holiday Deals and see here for terms. Disclaimer: The details in this post have been derived from the details shared online by the Reddit Engineering Team. All credit for the technical details goes to the Reddit Engineering Team. The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them. When you upvote a clever comment on Reddit or reply to a discussion thread, you’re interacting with their Comments model. This model is probably the most important and high-traffic model in Reddit’s architectural setup. Reddit’s infrastructure was built around four Core Models: Comments, Accounts, Posts, and Subreddits. These models power virtually everything users do on the platform. For years, all four models were served from a single legacy Python service, with ownership awkwardly split across different teams. By 2024, this monolithic architecture had become a problem:
In 2024, the Reddit engineering team decided to break up this monolith into modern, domain-specific Go microservices. They chose comments as their first migration target because it represented Reddit’s largest dataset and handled the highest write throughput of any core model. If they could successfully migrate comments, they would prove their approach could handle anything. In this article, we will look at how Reddit carried out this migration and the challenges it faced. The Easy Part: Migrating Read OperationsBefore diving into the complex scenario, it’s worth understanding how Reddit approached the simpler part of this migration: read endpoints. When you view a comment, that’s a read operation. The server fetches data from storage and returns it to you without changing anything. Reddit used a testing technique called “tap compare” for read migrations. The concept is straightforward:
This approach meant that if the new service had bugs, users never saw them. The team got to validate their new code in production with real traffic while maintaining zero risk to user experience. The Hard Part: Migrating Write OperationsWrite operations are an entirely different challenge. When you post a comment or upvote one, you’re modifying data. Reddit’s comment infrastructure doesn’t just save your action to one place. It writes to three distinct datastores simultaneously:
The CDC events were particularly critical. Reddit guarantees 100% delivery of these events because downstream systems across the platform depend on them. Miss an event, and you could break features elsewhere. The team couldn’t simply use basic tap compare for writes because of a fundamental constraint: comment IDs must be unique. You can’t write the same comment twice to the production database because the unique key constraint would reject it. But without writing to production, how do you validate that your new implementation works correctly? The Sister Datastore SolutionReddit’s engineering team came up with a solution they called “sister datastores”. They created three completely separate datastores that mirrored their production infrastructure (Postgres, Memcached, and Redis). The critical difference was that only the new Go microservice would write to these sister stores. Here’s how the dual-write flow worked:
This comparison happened across all three datastores. The Go service would query both production and sister instances, compare the results, and log any differences. The beauty of this approach was that even if Go’s implementation had bugs, those bugs would only affect the isolated sist |