Django ORM: what are its less obvious features?

Example models

I'm going to use database schema based on the one used in Django documentation regarding querying the database. This choice is obvious - after all, we all love pizza, right?

from django.db import models


class Topping(models.Model):
name = models.CharField(max_length=30)
vegetarian = models.BooleanField(default=True)

def __str__(self):
return self.name


class Category(models.Model):
name = models.CharField(max_length=50)

def __str__(self):
return self.name


class Pizza(models.Model):
name = models.CharField(max_length=50)
toppings = models.ManyToManyField(Topping)
category = models.ForeignKey(Category, related_name='pizzas')

def __str__(self):
return self.name


class Order(models.Model):
placed_at = models.DateTimeField(auto_now_add=True)
pizzas = models.ManyToManyField(Pizza)

def __str__(self):
return "Order #{}".format(self.pk)

Advanced prefetch_related() usage

Let's say we want to get all the orders and all the pizzas associated with them. Simply attempting to get all orders and then accessing pizzas related to them would result in n+1 queries being executed (one query to retrieve order objects and one query to retrieve pizza objects for each (so, n) of the orders):

>>> orders = Order.objects.all()
>>> print(orders) # executes a query to retrieve all Orders
[<Order: Order #1>, <Order: Order #2>, <Order: Order #3>, ...]
>>> orders[0].pizzas.all() # executes a query for Pizza objects related to the first Order
[<Pizza: Pizza, extra grapes>]
>>> orders[1].pizzas.all() # executes another query for the Pizza objects related to the second Order
[<Pizza: Pizza, with pineapple on it>]

To alleviate that, Django offers .prefetch_related() method allowing you to define which reverse relations should be prefetched. Prefetching mechanism then retrieves pizza objects that are related to any of the selected objects in a single query, reducing the query count from n+1 to 2:

>>> orders = Order.objects.prefetch_related('pizzas')

Prefetch() objects

Prefetch() objects were added in Django 1.7 to bring more control over prefetching.

Let's build on top of our example. We can notice that pizza has a relation to the category, so if we went ahead and tried to retrieve it for each pizza, we'd again get hit by n+1 query problem. One of the solutions would be to add another lookup to our .prefetch_related() call:

>>> orders = Order.objects.prefetch_related('pizzas', 'pizzas__category')

It helps but unfortunately uses another query to prefetch related categories, totaling to 3 queries. We can turn that query into a JOIN using Prefetch() object.

Prefetch() objects extend .prefetch_related() lookups by allowing to specify the query set to be used for prefetching as well as the name of the attribute to hold the result in, in the to_attr parameter.

>>> orders = Order.objects.prefetch_related(Prefetch('pizzas', queryset=Pizza.objects.select_related('category')))

This time, we only want to retrieve related pizzas with certain topping. Since we modify the resulting object list, it's best to keep them in a separate name using to_attr parameter:

>>> from django.db.models import Prefetch
>>> orders = Order.objects.prefetch_related(Prefetch('pizzas', queryset=Pizza.objects.filter(toppings__name='ham'), to_attr='pizzas_with_ham'))
>>> orders[0].pizzas_with_ham[0]
<Pizza: Hawaiian>

Conditional expressions

Conditional expressions, added in Django 1.8, are a powerful tool to offload even more business logic to the RDBMS. They are usually used in data aggregation.

Aggregation/annotation basics

There are several aggregation functions in Django, such as Count(), Sum() or Max().

Aggregation is taking a data set and producing a single value. For example, this is how we can get the maximum number of pizzas in a single order:

>>> from django.db.models import Max
>>> Order.objects.aggregate(Max('pizzas'))
{'pizzas__max': 4}

Annotation is performing a calculation for each object. We can use it to count the pizzas in each order.

>>> orders = Order.objects.annotate(Count('pizzas'))
>>> orders[0].pizzas__count
3

Conditional aggregation

Case()

Case() takes any number of When() objects, default parameter (which is the value used when none of specified When() conditions are met) and output_field (which must be set when ORM has a hard time guessing the proper column type to be used for the result).

When()

When() takes multiple lookup parameters (or Q() objects), similarly to QuerySet's .filter(). If those conditions are met, Case() object evaluation short-circuits and value specified by then argument is returned.

Usage

Now, let's say we want to list all available pizzas and count the non-vegetarian toppings they have.

>>> from django.db.models import Case, When, Sum, IntegerField
>>> pizzas = Pizza.objects.annotate(
... vegetarian_topping_count=Sum(Case(
... When(toppings__vegetarian=True, then=1),
... default=0,
... output_field=IntegerField())))
>>> print(pizzas[0].toppings.all(), pizzas[0].vegetarian_topping_count)
([<Topping: ham>, <Topping: pineapple>], 1)

Each related topping gets assigned 1 if it’s vegetarian, and 0 otherwise (I could've omitted Case()'s default parameter as it would yield the same results, omitting causes it to use None as a default parameter which is ignored by Sum()). output_field must be set when ORM cannot figure out on its own which data type it should use to store the annotation.

In the end, Sum() does what it's best at and sums these values together (actually that's pretty much all it can do, I just didn't want to sound so harsh).

Oh, and all of that happens on the database-level.

Imitating a SELF JOIN

That's more of a trivia. Our use case required us to compare all the objects in a single table with each other and mark collisions appropriately. It's useful if we want to store the colliding data as a draft and still allow for final validation. Since Django ORM doesn't provide an easy way to generate JOINs when there's no relation between models, it's unable to produce the desired result. No, wait, it actually can do that!

By (ab)using relation lookups we can go from a single pizza object to its category and then to all pizzas related to that category (through reverse relation):

>>> from django.db.models import F, Q, Case, IntegerField, Sum, Value, When
>>> from django.db.models.functions import Coalesce
>>> pizzas = Pizza.objects.annotate(
... uses_name_of_another_pizza_as_name_prefix=Coalesce(
... Sum(
... Case(
... When(
... Q(
... ~Q(pk=F('category__pizzas')) &
... Q(name__startswith=F('category__pizzas__name'))
... ),
... then=Value(1)
... ),
... output_field=IntegerField(),
... ),
... ),
... 0,
... ),
... )
>>> [p, p.uses_name_of_another_pizza_as_name_prefix for p in pizzas]
[
(<Pizza: Hawaiian>, 0),
(<Pizza: Hawaiian King>, 1),
(<Pizza: Pepperoni>, 0),
]

Silly example, but I hope you can get the grasp of what's happening here.

Want to know more about Django?

If you want to learn more about those topics, go ahead and check out Django documentation regarding them (prefetch_related(), aggregation, conditional expressions, and aggregation). There's plenty of additional information regarding related topics, and Django documentation overall is really well-written.

Navigate the changing IT landscape

Some highlighted content that we want to draw attention to to link to our other resources. It usually contains a link .