I introduced MongoDB to a colleague today (version 1.8.1 to be exact). He comes from a RDBMS, 3rd normal form, relational background. Here’s what he said. Keep in mind, we have a healthy humor level going at the office. I love it!
“Mongo is like taking a kid from a Jehovah’s Witness upbringing, driving them to the candy store and telling them, they can have anything they want in the entire store, and as much as they want, then after an hour or so of unfettered access to candy, giving them a hand gun and a chainsaw.”
We are using Django and Postgres on a new project. Up until recently, we were working towards a fully normalized database for our 1.0 release. In the first month we passed fifty tables. I told the project owner, we’d probably hit four hundred tables by the time all features of the system are completely done in three years. Note that we had 50 tables, but only 2 or 3 working UI elements, and nothing to launch. That bugged me on a project level. On a technical level, do you think Django can handle 400 tables? It might take some segmentation, perhaps a server with tons of RAM, maybe even some hacking of the framework? It just stopped feeling right, so I started looking for something better. Maybe it wasn’t Django, maybe it was the fact that a lot of what we are trying to model at this stage revolves around user generated content. It is almost impossible to predict everything we need to store for the life of the project. This kind of problem, as it turns out, is a good fit for NoSQL, or non-relational data stores. Here are some of my initial take aways:
What to store in MongoDB:
- Loosely structured content.
- Archival / historical data.
- Logging data and non-critical events.
- Data models that we know will change a lot, but want to get up and running. We can always port them to the RDBMS if we need to, after we figure out what the best approach is. Maybe they will stay in MongoDB for ever and ever?
We will likely store a handle to the MongDB document id (eg “_id” : ObjectId(“4dbf132825fe7d20c1000000”)), inside the relational database in some cases for direct retrieval.
What you loose with MongoDB:
- Support for the Django Admin tool. This is a major drawback to consider. I really love Django Admin and what it gives to our team, especially the content folks who can get in and easily make changes and manage records. You’re on your own for CRUD (create, retrieve, update, delete). Wait a year and there will be 5 options – mark my words!!!
Update 7/12/2011 – A reader reported to me about django-mongodb an django admin interface for mongodb. I need to try it out.Update 7/22/2011 – I looked into django-mongodb, it requires a) Django-nonrel, “a fork of Django 1.3 that adds support for non-relational databases”, and b) djangotoolbox “a bunch of utilities for non-relational Django applications and backends”. The ultimate goal of Django-nonrel is to merge with the official Django tree. When that happens there will be much rejoicing! Until then, it is doable to combine Django and MongoDb, but it is early stage technology that is moving fast. Make sure it is the right fit for your project first.
- Referential integrity. There is no concept of a foreign key. Anything goes inside MongoDB, it will gladly create new collections (like tables), and document fields (like columns) as you ask it to.
- Joins.
- Triggers and stored procedures. You can craft JavaScript and have MongoDB run that – pretty cool idea actually and it is how map/reduce works.
- Standard RDBMS security model. Security is basic in MongoDB, and as it stands you have to run it in trusted mode when sharding. From the docs page “Trusted environment is the default option and is recommended.” – wow, so firewall it off with iptables and harden your application layer I guess?
- Data where the relationship to other records is most of the value – eg, a many-to-many relationship would be really hard to get working in MongoDB. It is doable and it works for some people. However, MongoDB isn’t geared towards relational data, so that tells me not to do relational data with it!
- Anything that needs to be transactional in nature.
- For now, financial data, transporter patterns, anything that *has* to be there.
More on this topic in the future. MongoDB appears to have an excellent trajectory. Its creators will likely have a place in the geek hall of fame. It is really fun to work with too.