Outsourced software developers work to achieve the client’s business goals. While seemingly innocent code deficiencies sometimes stay undetected – typically in smaller apps – they surely loom large in enterprise-level digital solutions. How to avoid them and prepare your client’s apps for scaling? Take a look below!
As our application grows, it needs to process more data and handle more users simultaneously. The database is under heavy load almost constantly and without taking steps towards optimizing it, our app can throttle down and everything could run extremely slow.
Django is able to handle it efficiently, but not always by itself. We need to tell it what and where it needs to do, and provide it with some tools to do it even better. Some of these things, like database indexes and proper queryset management, can be done from the start to keep our app future- (and scaling-) proof.
QuerySets constitute the core of Django’s communication with databases. These are special Python objects that have certain properties and methods, which help us perform a number of different actions with our data: retrieve, create, delete, filter, count, and many more.
One of the most important QuerySet traits is their laziness, which means the database query is performed only when it’s actually used. So, just creating the QuerySet and filtering it by some fields won’t run any query on our database - it’s at the moment when we use the data that the query is actually executed. This gives us flexibility of composing and crafting our QuerySets to our needs in as many lines of code as we wish.In this way, we avoid running any unnecessary queries, which could be costly in case our application is sizable.
So, here’s what we want to avoid:
bobs = Employee.objects.filter(first_name=”Bob”)
alices = Employee.objects.filter(first_name=”Alice”)
Instead, you can do this:
employees = Employee.objects.filter(first_name__in=[“Bob”, “Alice”])
bobs = employees.filter(first_name=”Bob”)
alices = employees.filter(first_name=”Alice”)
In the first example, Django creates two separate QuerySets, which result in two database queries. We can avoid this by creating a single QuerySet, and then filter out the data we need from it - as we did in the second example, which executes just a single query. Neat, huh?
We just avoided doubling our database queries! Doesn’t sound like much, but having a similar mistake in an app that handles hundreds or thousands users at a time could put heavy load on our database, which can easily be reduced by a significant amount just by putting some effort into creating and performing our QuerySets in a well thought-out way.
Another thing we can do preventively at the very beginning of writing our models are indexes. Make sure to have them on fields that you are likely to use often in filtering, sorting, and excluding - but don’t overdo it. Having too many indexes can have the opposite effect than you’d want. If you only read your database, that shouldn’t be an issue, but keep in mind that modifying data requires index rebuilding - so, if you do a lot of creates, updates, or deletions, these will take a long time.
Let’s now talk about how we can build QuerySets so they’re as optimal as possible.
Know what you need, and what you don’t
First of all, always make sure that you only get the data you absolutely need to fetch. Retrieving more than you need puts your app’s performance at risk, especially if it scales up in the future. Django provides several tools to help you with that. If you need only a few fields from the table, use only() and defer(). It ensures as little load on the database as possible and is also a good thing to get used to just for the sake of being explicit about what you need. You can also make use of values() and values_list() if you only need the data as a dictionary.
If you only need to retrieve the number of items in database, or just check if they exist, use count() and exist(), respectively. Of course, this applies only if you haven’t already fetched the data (or you’re not going to). Remember, we don’t want to double our queries! The only exception is, when you already have a QuerySet for something else, it’s better to use a Python len() instead of calling count() on it, because it ensures the query is not performed again.
It’s cheaper in bulk
Just like in real life, it’s always better to handle multiple items in bulk. So, if you are trying to update or create a bunch of items, always use Django’s bulk_update and bulk_create instead of doing it in a for loop.
Databases are fast and it’s always best to utilize what they are able to do first, before acting on a higher (Python or Django) level.
Since we’re at the topic of handling multiple data at once, let’s also mention the iterator() method here. If we have several thousands items to update, it’s better to take them in chunks, so we don’t throttle down the entire system. This method does exactly that.
Running for loop on a QuerySet with hundreds of thousands of items will almost always slow down your machine, because it will run out of memory very quickly. Using chunks will ensure you only handle so many items simultaneously, before proceeding to the next ones.
Relationships can be difficult, and the same goes for programming. But, unlike real life, Django provides some tools that make them easier.
With models that have relations to other models, you often need to retrieve additional data from related fields. Django allows you to do that easily, but it can get tricky at scale. When you have several hundreds or thousands of people in your QuerySet and need to get their, let’s say, addresses, which are different Django models, the data retrieval can take a very long time. Let’s have a look at the example:
employees = Employee.objects.all() for employee in employees: print(employee.address.street)
This will execute a new database query for every single employee. We usually do not want that to happen. However, calling the select_related(“address”) on our QuerySet will make an SQL join and fetch the data in a single query! This will reduce a load on our database and drastically improve performance.
However, select_related() works only on single-object relations (like foreign key up, or one-to-one both ways), and won’t help us with “down the tree” relations, like many-to-many or foreign key down. Django has another tool for that - prefetch_related(). While it won’t perform a single database query anymore, it will still make far less of them than it would have otherwise.
Having addresses as a many-to-many field in our Employee model, getting all addresses for every employee would have Django perform employee.addresses.all() behind the scenes for every single employee we have in our QuerySet. That would, again, put a heavy load on our database and could result in a request timeout. Instead, prefetch_related(“addresses”) will make a separate query for every Address model and fetch all addresses related to employees we have in our QuerySet. Results will be cached and merged behind the scenes by Django, and we’ll have them available in our single employees QuerySet.
None of these helped - what do I do?
If your app is still having performance issues, you need to dig deeper and debug your queries. Having a look at the SQL that Django generated with your QuerySet (query attribute) can help you narrow down the root of the issue.
Also, keep in mind that there are loads of tools that help you track your app’s performance - New Relic, Datadog, Splunk, you name it. Such tools will help you find where the root of the problem lies.
Django admin - how it’s slow by default
If your app is scaling up, at some point you’ll notice significant performance drops, for example a list view of a model that has a lot of objects in database's table.
By default, Django admin uses normal QuerySets and standard pagination. In edge cases, you can even have a 504 timeout error, because there’s too much data to be even counted for splitting into pages. In that case you can do some tricks to turn off pagination (so Django doesn’t count the results) or even replace default templates so you use your own views and QuerySets.
Each of these solutions can (but not necessarily has to) have little effect on your app by itself, but remember - there’s strength in numbers. Combine them, and you will see a world’s difference.
It could happen that only a single well-executed solution will solve your app’s problems entirely, because, for example, only a single database table had poor performance. But if your app is slow overall, it’s almost certain that you will have to implement a set of different solutions.
Be sure to measure if a certain change has brought an improvement in performance - the tracking tools that I have mentioned above will help you with that. You need to see the numbers, because having only the “feel” of your app getting slower or faster, is definitely not enough - we’re humans, after all, and we get easily fooled by our mind.
Additional recommended reading:
Eager to execute more effective app solutions? Join us and let’s shape the digital world together!