Tag Archives: mongodb replication

Replication Across the Country

The MongoDB .NET driver recently had an issue reported that turned out to be a bug on our part. It is a subtle bug that wouldn’t have shown up except for in a specific replica set configuration. I’ll first discuss the new behavior in 1.6 regarding read preferences for replica set members and then discuss the configuration and what actually happened.

Replica sets are the MongoDB name for a group of servers that have one primary and N secondaries. Standard setups usually include 3 replica set members, 1 primary and 2 secondaries. All write operations go to the primary and all reads are governed by the stated read preference and tagging. Together, read preferences and tagging form a way of targetting a specific server or a group of servers in the cluster for reads. In a heavy read/write environment where not-immediately up-to-date reads are valid (most scenarios actually fall into this camp), then a way to load balance your cluster is to allow secondaries to serve up data for reading while the primary takes care of writing.

There are a number of read preferences, Primary, PrimaryPreferred, Secondary, SecondaryPreferred, and Nearest. SecondaryPreferred means to read from a secondary if one is available, otherwise, read from a primary. In addition, when choosing a secondary, we only consider secondaries within 15 milliseconds (by default) of the lowest secondary ping time. We do this to ensure that your reads are generally as fast as possible.

For example, in the setup below with 4 secondaries and one primary, we’d randomly choose from servers B, C, and E when using the SecondaryPreferred read preference. D would be excluded because it’s ping time is 17ms behind that of the lowest secondary’s ping time.

Server   Type       Ping Time
A           Primary       3ms
B          Secondary   7ms
C          Secondary   2ms
D          Secondary   19ms
E          Secondary   11ms

Cloud providers like EC2 and Azure offer the possiblity to stand-up replica set members in different regions of the country. This is great because when an entire region goes down, your app can still function by reading off the servers in other regions of the country. In the case of the bug mentioned at the top of this post, a 2 member replica set existed where the primary was in Region 1 and a secondary was in Region 2. In addition, the web application was located in Region 1. Using the read preference SecondaryPreferred, every single read will have to exit Region 1 and go all the way to Region 2 to get data. This distance imposed a ~100ms penalty.

Our bug manifested itself because of this ping time lag in the regions. Even though we were supposed to choose the secondary, we didn’t because it’s ping time was so much slower than that of the primary. The fix for us is easy, but the customer has to wait for us to fix it. Hence, we suggested a better setup of the cluster to remedy the problem in the mean time. Simply setting up a new secondary in Region 1 will send all reads to this secondary, all writes to the primary, and the secondary in Region 2 is used for failover and backup. I’d actually suggest this setup regardless of the bug.

We’ll be fixing this bug in version 1.6.1, but be aware of your lag times when using disparate data centers.