Category Archives: MongoDB

Replication Across the Country

The MongoDB .NET driver recently had an issue reported that turned out to be a bug on our part. It is a subtle bug that wouldn’t have shown up except for in a specific replica set configuration. I’ll first discuss the new behavior in 1.6 regarding read preferences for replica set members and then discuss the configuration and what actually happened.

Replica sets are the MongoDB name for a group of servers that have one primary and N secondaries. Standard setups usually include 3 replica set members, 1 primary and 2 secondaries. All write operations go to the primary and all reads are governed by the stated read preference and tagging. Together, read preferences and tagging form a way of targetting a specific server or a group of servers in the cluster for reads. In a heavy read/write environment where not-immediately up-to-date reads are valid (most scenarios actually fall into this camp), then a way to load balance your cluster is to allow secondaries to serve up data for reading while the primary takes care of writing.

There are a number of read preferences, Primary, PrimaryPreferred, Secondary, SecondaryPreferred, and Nearest. SecondaryPreferred means to read from a secondary if one is available, otherwise, read from a primary. In addition, when choosing a secondary, we only consider secondaries within 15 milliseconds (by default) of the lowest secondary ping time. We do this to ensure that your reads are generally as fast as possible.

For example, in the setup below with 4 secondaries and one primary, we’d randomly choose from servers B, C, and E when using the SecondaryPreferred read preference. D would be excluded because it’s ping time is 17ms behind that of the lowest secondary’s ping time.

Server   Type       Ping Time
A           Primary       3ms
B          Secondary   7ms
C          Secondary   2ms
D          Secondary   19ms
E          Secondary   11ms

Cloud providers like EC2 and Azure offer the possiblity to stand-up replica set members in different regions of the country. This is great because when an entire region goes down, your app can still function by reading off the servers in other regions of the country. In the case of the bug mentioned at the top of this post, a 2 member replica set existed where the primary was in Region 1 and a secondary was in Region 2. In addition, the web application was located in Region 1. Using the read preference SecondaryPreferred, every single read will have to exit Region 1 and go all the way to Region 2 to get data. This distance imposed a ~100ms penalty.

Our bug manifested itself because of this ping time lag in the regions. Even though we were supposed to choose the secondary, we didn’t because it’s ping time was so much slower than that of the primary. The fix for us is easy, but the customer has to wait for us to fix it. Hence, we suggested a better setup of the cluster to remedy the problem in the mean time. Simply setting up a new secondary in Region 1 will send all reads to this secondary, all writes to the primary, and the secondary in Region 2 is used for failover and backup. I’d actually suggest this setup regardless of the bug.

We’ll be fixing this bug in version 1.6.1, but be aware of your lag times when using disparate data centers.

Advertisements

Disconnecting with the MongoDB .NET Driver

The MongoDB .NET Driver has a public method called Disconnect on the MongoServer class.  This method is somewhat useful in certain contexts such as when the server is shutting down or the application is exiting.  However, it is extremely important to know what this method does before using it because it could kill your application.

The documentation simple states that this causes the server to disconnect from the server.  In other words, this method terminates all connections to all the servers and shuts down any in flight operations.  This isn’t your standard ADO.NET Connection at all.  In fact, MongoServer isn’t a connection at all, but rather a proxy to 1 or more mongod or mongos processes.

In addition, the documented way to get access to a MongoServer is to use a static Create method.  MongoServer.Create() and all it’s overloads actually return the same instance when the specified connection settings match a previously created MongoServer.  Therefore, the documented behavior of Disconnect is even more unexpected when this information gets factored in.

There is a good reason for this method.  It cleanly disposes of all the resources associated with the many connections and sockets it manages.  So it’s useful when an application is exiting or the OS is shutting down.  However, most people call Disconnect because it’s there and it seems like the right thing to do.

We’ve started working on the next version (2.0) of the driver and are working through some issues, such as this one, to clean up and correct so that the api matches expectations and to make it much more difficult to do the wrong thing.

MongoSV Conference

I just returned from the MongoDB conference in San Jose on Saturday.  Because I’m a MongoDB Master, I was able to attend the Master’s Summit the day before the conference.  We did it unconference style and let each topic self-select based on what we wanted to talk about.  I discussed of lot of windows related things like performance counters and SCOM integration as well as how to evangelize to the Microsoft community as a whole.  10gen is really looking to expand into this area more so than they have in the past.

One of these efforts is that MongoDB now runs on Azure.  This is cool because it gives another possibility for scaling in the cloud.  Azure already offers 3 forms of data storage.  SQL, Table, and Blob.  Blob is just a filesystem and suitable for binary items like images.  Table storage is a way to store large quantities of non-relational data.  It is relatively cheap as is blog.  The last is SQL, where Azure supports storing relational data.  SQL Azure, however, is extremely expensive compared to Blob and Table storage.

MongoDB fits in between Table Storage and SQL Storage.  Underneath, it uses Blob storage to keep the data, making it much cheaper that SQL Azure.  MongoDB does not represent its data in relational form, but rather in document form.  However, unlike Table Storage, MongoDB is fully queryable, fully indexable, and super fast.  It is a great alternative for bridging the needs between dynamic queries and fully relational data.

All in all, I thoroughly enjoyed my time and hope to continue it through contact with the other Masters and feedback to 10gen.

Attending MongoSV

I’ll be attending MongoSV in california over the next two days.  Day 1 will be a summit for the MongoDB Master’s group (of which I am a member).  We’ll be discussing anything and everything about MongoDB with hopes to influence it’s future direction.

Day 2 will be more interesting.  As a .NET developer, I’m thoroughly interested in all things related to Microsoft.  A few days ago, 10gen announced that MongoDB has support for running on Azure.  In fact, Microsoft will be speaking on the topic at the conference.  This is totally interesting because it lets us marry a scalable infrastructure with a scalable database and not have to sacrifice either one for the other.  I have nothing against SQL Server and use it for all my transactional business needs.  However, when building systems to scale, transactional business models are not the correct choice.  I’ll talk more on this topic in my next post on CQRS.

Until then, I’ll take notes and blog my thoughts about the direction 10gen is going with MongoDB in the future.

MongoDB Open Source Efforts

I actively (when I have time) work on some open-source projects.  Both of them are related to the MongoDB C# Driver (to which I contributed a lot of code as well).

The first is FluentMongo (https://github.com/craiggwilson/fluent-mongo) which is a linq provider on top of the driver.  This was sucked out of an older C# driver (now defunct) to which I was a core committer with Steve Wagner (http://www.lanwin.de/) and Sam Corder. Writing linq providers is incredibly difficult and I was so proud of my effort in the defunct project that I didn’t want it to go to waste; so I ported it over since the official driver did not have one (and still doesn’t).

The second project is Simple.Data.MongoDB (https://github.com/craiggwilson/Simple.Data.MongoDB).  If you haven’t yet played with Simple.Data (https://github.com/markrendle/Simple.Data), then you are missing out.  It is completely abusing the point of C# 4’s dynamic keyword to build an Active Record style data layer in .NET.  It is a great fit for MongoDB because neither require a schema.  Simple.Data was built for a relational database but working with Mark Rendle has been a pleasure and he has changed some of the core to accomodate a different style database.

Anyways, just wanted to get this stuff out there and I’ll keep these updated as I add features to either one.