The Django framework abstracts the nitty gritty details of SQL with its Model library (the M in MVC). It is straight forward for standard lookups involving WHERE clauses. 10% of the time I need it to do something fancy like a GROUP BY aggregate query. This required checking the docs to see ‘how the heck do I make Django do a GROUP BY with a count(*)?‘. I’ll explain in detail below with examples. Django has a method called raw() for running custom SQL, but that is a last resort. Thankfully Django supports this case really well and did exactly what I was expecting.
This information applies to Django 1.8.2.
Example ‘Bike’ model:
In this example, the Bike model has paint color, seat color, and category:
class Bike(models.Model): name = models.CharField(max_length=50) paint_color = models.CharField(max_length=255) seat_color = models.CharField(max_length=255) category = models.CharField(max_length=255) active = models.BooleanField()
The SQL I wanted Django’s Model to run for me:
SELECT paint_color, count(*) FROM bike WHERE paint_color IS NOT NULL AND paint_color != '' AND active = 1 GROUP BY paint_color ORDER BY paint_color; -- same thing for seat_color and category SELECT seat_color, count(*) FROM bike WHERE seat_color IS NOT NULL AND seat_color != '' AND active = 1 GROUP BY seat_color ORDER BY seat_color; SELECT category, count(*) FROM bike WHERE category IS NOT NULL AND category != '' AND active = 1 GROUP BY category ORDER BY category;
My report needs a count of all the active bikes by paint_color, by seat_color, and by category. Note that the columns allow null and empty string, so those need to be filtered out of the report.
How to do the GROUP BY / count(*) with Django:
Bike.objects.filter(active=1) .exclude(paint_color__exact='') .exclude(paint_color__isnull=True) .values('paint_color') .annotate(total=Count('paint_color')) .order_by('paint_color')
For more details see the documentation page on Django Aggregation.
The call returns a list of dictionaries like so:
[ {'paint_color': u'Green', 'total': 15}, {'paint_color': u'Blue', 'total': 19}, {'paint_color': u'Yellow', 'total': 4} ]
Getting fancy – allowing dynamic column substitution by variable name:
The code above is a start, but I don’t want to have three copies of that lengthy model query floating around in my code. This calls for converting ‘paint_color’ into a parameter. I also opted to go with a static method, which I can do like so on the Bike model:
@staticmethod def summary_report(fieldname): allowed_fields = ('paint_color', 'seat_color', 'category') if fieldname not in allowed_fields: return {} return (Bike.objects.filter(active=1) .exclude(**{fieldname + '__exact': ''}) .exclude(**{fieldname + '__isnull': True}) .values(fieldname) .annotate(total=Count(fieldname)) .order_by(fieldname))
Now the parameter fieldname takes the place of the hard coded string. In the spirit of defensive coding, the method checks to make sure that fieldname is an authorized property on the Bike model in this context. It could also throw exception, log an error, etc, but it is kept simple for this example. From there, the exclude() calls use **kwargs (keyword arguments) to pass in the dynamic value.
The data for the Bike report can be obtained as follows:
summary_paint_color = Bike.summary_report('paint_color') summary_seat_color = Bike.summary_report('seat_color') summary_category = Bike.summary_report('category')
How to see what SQL Django Query generated?
As I was working on this, I needed an easy way to see what SQL Django was generating behind the scenes. Django Debug Toolbar does it nicely.
To install the Django Debug Toolbar it takes just two steps:
$ pip install django-debug-toolbar
Then add ‘debug_toolbar’ to your INSTALLED_APPS. It requires django.contrib.staticfiles. Refresh your page, and you’ll see the debug toolbar come up:
Hope this helps!