Querying Complex Data in Django with Views

Django has a powerful ORM, but like any ORM it suffers when querying data that has complex relationships. Django supports foreign keys and many to many relationships out of the box with ease. Sometimes a report or data feed has half a dozen joins and calls for representing the data slightly differently than the way the models are defined.

I’m thinking of a summary report where you need to group rows by month or year, but also pull in related rows that are several joins away.

Choice 1 – bend the Django ORM to your will, but accept the limitations.

By default the Django ORM does not do any joins. When application code accesses related entities or child sets it will make additional calls to the database on the fly. This can really slow down an app, especially if there are loops involved.

To improve on that the Django ORM has `select_related()` for a standard one to one join. It also has `prefetch_related()` for sets (many to many) where it does a second query that gets associated matches.

Under the hood Django pulls all columns in the SELECT clause it builds. It is trying to hydrate a model instance for each row so it grabs everything. Pulling unneeded data slows down the call and expands the memory footprint needed for the request. To solve that you can use `values()` on the query to limit what columns are returned.

Django also allows `GROUP BY field_x HAVING …` queries, which I’ve written about.

For most things you can get it working with some trial and error and Stack Overflow as your co-pilot.

 

Choice 2 – write your own SQL query and run it with cursor.execute():

Wait a sec… what is SQL? This is the year 2020. Nobody really knows SQL anymore. All they know is JavaScript.

I’m half joking here but plenty of people would argue writing SQL in your application code is a big mistake. In 2005 I don’t think anybody would have guessed that is how the world would be 15 years later.

Raw SQL can be hard to maintain. After-all it’s a huge magic multi-line string. From a maintenance perspective it exists independent of the application’s models and migrations. It is kind of hanging out, like a rat in the wall. It is great for one-off scripts and rare corner cases, but I would not want to rely on raw SQL for day to day Django work.

Note that with raw SQL there is also a small security risk in the form of the SQL injection attack. This can happen if you (or future you, or the new intern) forget to parameterize any inputs.

 

Choice 3 – go with a database level view:

A database view (supported in MySQL and Postgres, among others) is basically a SELECT statement that gets saved in the database. Views operate like a read only table. A view can do all sorts of clever things like joins, generate new columns based on the row level values, aggregate functions etc.

The nice thing about setting up a view and pointing Django to it is, Django can query it like an ordinary table. In fact Django doesn’t even know it is really a view under the hood. You may not edit the rows in the view though.

 

How to get Django to work with views:

Note this is working for me with Django 2.2 and Python 3.7.

First design your view in SQL to get the bugs worked out.

Next setup a new model that mirrors the columns. For views, you can put them in models.py, but I use models_views.py so it is clear it is a view and not a regular model.

Make sure to set managed = False.

from django.db import models, NotSupportedError

class ExampleView(models.Model):
    name = models.CharField(primary_key=True, editable=False, max_length=100, blank=True, null=True)
    status = models.CharField(max_length=25, null=False)
    amount = models.DecimalField(max_digits=24, decimal_places=6, blank=True, null=True)

    def save(self, *args, **kwargs):
        raise NotSupportedError('This model is tied to a view, it cannot be saved.')
    
    class Meta:
        managed = False
        db_table = 'example_view'    # this is what you named your view
        verbose_name = 'Example'
        verbose_name_plural = 'Examples'
        ordering = ['name']

Create the migration:

$ python manage.py makemigrations

This will generate a migration file under that app’s migrations directory, something like migrations/00xx_exampleview.py.

Open that file and add your view’s create statement. This is an example for MySQL:

dependencies = [....]

# put your SQL to create (replace) the view here, below the dependencies line
sql = """CREATE OR REPLACE SQL SECURITY INVOKER VIEW example_view AS
SELECT
    name,
    status,
    sum(total) as amount
FROM some_table 
GROUP BY name, status;"""

operations = [
   migrations.RunSQL(sql),   # add this line so it runs as part of the migration
   migrations.CreateModel(
   name='ExampleView',
.....

Run your migrations as normal:

$ python manage.py migrate

 

Use your view model as normal:

example_rows = ExampleView.objects.filter(name='test').all()

 

When to use each choice in my opinion:

Most of the time I try and get the Django ORM to do the job. If that fails or is getting really crazy, I opt for choice 3 – the database view since that keeps things so much cleaner. I use raw sql for temporary scripts, prototypes, build and throw away kind of stuff where nobody cares about maintenance or clean code.

Pros of using views:

  • Can greatly simplify application code where nested loops or complex joins would be required.
  • Leverage the power of the database engine to provide data to your application.

Cons of using views:

  • Any rows retrieved from a view cannot be saved back to the database, they need to be treated as read only objects.
  • Anecdotally, I’ve seen comments in Stack Overflow where people complain that views are slower than running the equivalent query. So your mileage may vary. If you do run into that, dropping back to raw SQL is a feasible work around.
This entry was posted in Code, Data and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *