Using the extra() QuerySet modifier in Django for WeGoEat
Since I actually used this method to reduce the number of Update:”explicit” SQL calls made in WeGoEat, I figured I’d write a little blog explaining the context in which it was used, and maybe, just maybe, it’ll help shed some light on how others can take advantage of this neat little function.
Background
As a Django “proof-of-concept”, I’m working on a local restaurant review site for my home state of Hawai`i. (I actually just released it yesterday). For each restaurant, I want to be able to calculate the average of all reviews and display this listing in a paginated view. (Yes, I do realize there’s no average rating, but that has to do with there being no users. ;P).
The Problem
Having a serious “wtf was I thinking moment”, I initially wrote a Restaurant model function that returned the average (review) rating for each restaurant instance. Little did I realize that when I actually displayed the restaurant’s average reviews, I would be making an additional SQL avg() call for every restaurant. Though I’m paging “n” records at a time, this function added an additional “n” SQL calls for every view that contained a restaurant listing, just to name a few.
In pseudo-code, my initial naive function resembled the following: (I’m sure we’re all guilty of writing something of the sort… ok, fine, I know I was. ;P)
1 2 3 4 5 6 | def get_average_review(self): query = 'QUERY TO GET AVERAGE (SELECT AVG(rating)...); (I have the query below)' # Get cursor from connection cursor = connection.cursor() cursor.execute(query) return cursor.fetchall() |
Duh.
Here’s a picture of the number of queries it took:

The “extra()” solution
After profiling my application and realizing what a bone-headed mistake I made, I began researching the extra() Queryset modifier. Yes, I realize that these extra lookups aren’t the most portable and often violate the DRY principle, but it’ll probably suffice for most of all my personal projects.
Since I’m already retrieving a list of Restaurants and filtering them via letter, island, and what not, I figured I could add an average rating subquery. The entire call looks as such:
1 2 3 4 5 6 7 | restaurants = Restaurant.objects.filter(name__istartswith = letter).extra( select={'avg_rating': 'SELECT AVG(overall_rating) FROM restaurants_restaurant as res, reviews_review, django_content_type \ WHERE restaurants_restaurant.id = res.id \ AND res.id = reviews_review.object_id \ AND reviews_review.content_type_id = django_content_type.id \ AND django_content_type.model = \'restaurant\''}, ) |
As you can see, I’m exploiting the fact that restaurants_restaurant will be available from the Restaurant.objects.filter() call. (I know, I know… bad for portability).
But voila!
Now, in my templates, when I iterate over the restaurants, I can get issue the following:
1 2 3 4 5 6 7 8 9 10 11 | {% for restaurant in restaurant_list %} <tr> <td><a href="{{restaurant.get_absolute_url}}">{{ restaurant.name }}</a></td> <td>{% if restaurant.avg_rating %} {% load show_stars %} <span class="average-rating"> {% show_stars <strong>restaurant.avg_rating</strong> of 5 round to quarter %} </span> {% endif %}</td> </tr> {% endfor %} |
Notice how I used my show_stars template tag that I blogged about a few weeks ago to display the average restaurant rating. (Cheap shameless plug, but damn effective!
) I’d link to a page in action, but since I just opened up my site to a few select users, I’ll update this post when I actually have any reviews.
Oh, and before I forget, thanks to my co-worker Stephen for assisting me with my SQL issues!
Here’s a picture of the final result:

Note:
As an added bonus, I also realized a few other ‘spots’ where the .extra() Queryset modifier would come in handy. Since I’m also using the wonderful django-voting application from Jonathan Buchanan, I came across this post about accessing a dictionary via a template in the Django-users Google Group.
Basically, I had come across the same issue as the poster. Since I allow users to vote on reviews (similar to Amazon, Yelp, etc.), I wanted to retrieve the score of each Review instance to display on a paginated listing of all Reviews. Using the same extra() modifier, I was able to inject the total number of votes and the score when I retrieved all Reviews as such:
Btw, I just injected most of the code from Jonathan’s template tag.
1 2 3 4 5 6 7 8 9 10 11 | .extra(select={'total_votes': 'SELECT COUNT(vote) FROM votes as v, reviews_review as rev, django_content_type \ WHERE reviews_review.id = rev.id \ AND v.object_id = reviews_review.id \ AND v.content_type_id = django_content_type.id \ AND django_content_type.model = \'review\'', 'score': 'SELECT SUM(vote) FROM votes as v, reviews_review as rev, django_content_type \ WHERE reviews_review.id = rev.id \ AND v.object_id = reviews_review.id \ AND v.content_type_id = django_content_type.id \ AND django_content_type.model = \'review\''},) |
Pretty neat right?
Now, when iterating through the reviews, I can use the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 | {% for review in object_list %} <tr> <td><a href="{{review.content_object.get_absolute_url}}">{{ review.content_object.name }}</a></td> <td><a href="{% url profile-detail username=review.user.username %}">{{ review.user.username }}</a></td> <td><nobr>{% load show_stars %} <span class="rating">{% show_stars review.overall_rating of 5 round to half %}</span> </nobr> </td> <td><span style="font-weight:bold; color:#092e20;">{{ review.get_recommendation_display }}</span></td> <td><span style="font-size:.875em;">{{ review.submit_date|timesince }} ago</span></td> <strong><td>Total of {{ review.score|default:0 }} from {{ review.total_votes }} {{ review.total_votes|pluralize:"person,people" }}.</td></strong> </tr> {% endfor %} |
Hope y’all learned something like I did!
Oh, and before I forget my standard disclaimer, “since this is on my blog, feel free to take/use/steal/distribute/copy/modify any code you see fit, but if you find any bugs, have any comments, or think the code can be cleaner, I’d love to hear from you.”




October 24th, 2007 at 9:38 am
Nice article, one notice though, in django-voting you can use special tag for your task:
First, get all scores for objects on page:
{% scores_for_objects obj_list as score_dict %}
Then, assign every vote from dict to object in loop:
{% dict_entry_for_item obj from score_dict as score %}
This makes less SQL-queries and keeps code more portable, dry (other blah-blah-blah
).
October 24th, 2007 at 11:02 am
@Dmitry:
I think scores_for_objects calls Vote.objects.get_scores_in_bulk(objects) – that returns a dictionary. So you could iterate through this dictionary, but for every object, I would think you would have to iterate over every item in the dictionary…
{% scores_for_objects obj_list as score_dict %}
{% for object in obj_list %}…
{% for key,value in score_dict.items %}
// check if it is equal to object, then print out score.
{% endfor %}
{% endfor %}
But then again, I could be seriously wrong about that.
Could you elaborate?
I do know this doesn’t make any less/more SQL-queries since the query is the same one I took my subquery from, but you are right in that it is definitely more portable/dry because Jonathan’s a smart dude.
Update: @Dmitry: Actually, I was wrong (and you were right), that template tag *does* make less SQL calls – I’ll profile the AVG(), GROUP_BY later today, but I think the processing time would be moved into the template at that point, i.e. http://www.mail-archive.com/django-users@googlegroups.com/msg36615.html. Thanks for the comment! (and setting me straight). =)
October 24th, 2007 at 12:30 pm
Is this actually an improvement? I mean you are still making all those sql calls with the subquery. I feel like doing a join and then using the AVG() function would make it so it is truly 1 call. But I don’t know enough about databases to know which one is more efficient.
October 24th, 2007 at 1:22 pm
@Benjamin:
I think that at the end of the day, this is an improvement (at least performance wise) – but I can see how it is arguably so. I agree that the SQL is being executed with the subquery, but I think you have to look at where the query is being optimized.
In the subquery extra() clause, the database is handling the optimization for you; ie we’re letting the database do what it’s good at. As to how optimized the MySQL query analyzer is for its B-tree is up for debate (I’m no database expert). I think letting the database handle this versus making “N” number of MySQL AVG() calls (while retrieving connections, etc) is a definite improvement performance-wise.
However, as I said earlier, I think it can be debated since this type of SQL injection obviously leads to maintenance hell in large projects.
But then again, since I’m no database guru, this could all be total bs.
As for your join and AVG() function call, maybe someone can pitch in and set
usme straight.Update: @Benjamin: Read my previous comment as I think your comment was where the templatetag is going.
October 24th, 2007 at 3:30 pm
It is definitely an improvement. But showing it as a good solution and blog about it is not. This SQL still makes subselects, for every single row. The real problem is Django in this case, we should focus on that, instead of praising workarounds which circumvent real SQL solutions.
October 24th, 2007 at 4:31 pm
I definitely agree that the SQL makes subselects for every row.
Since I’m no database guru, I’ll profile it under SQL analyzer to see how bad/optimized it is. Yet, I don’t think Django is the problem here at all. If anything, it’s me as the developer, and if you were trying to be nice about it… Thanks!
If I knew the optimized SQL to write, I would.
Maybe the optimal thing to do is to perform the query to retrieve the objects, then perform the average call, then merge the data QuerySet’s together…
October 24th, 2007 at 5:08 pm
That’s a neat use of the extra() function; however I believe this may be a case for database de-normalisation. The most efficient way of doing this is probably to add an ‘avg_rating’ field to your Restaurant model, and use a signal so that whenever a review is saved, you re-calculate the average rating for that Restaurant and save it directly on the model.
There is a huge upside to this de-normalised method: You can use sort_by() or limit() to only show restaurants that have a rating above 4, or sort them by rating.
Lastly, it should produce more portable code, as subselects may or may not work the same way on different database servers.
October 24th, 2007 at 5:43 pm
@Ross -
By far the best suggestion yet!
I hadn’t even begun to think of denormalizing the database at all!
I’ll be working on the signal tonight. Gosh, community rocks!
+1+100 for Ross.October 24th, 2007 at 9:01 pm
I can’t take all the credit for it, the idea first came into my head when I saw it in the Lost Theories source code that Jeff Croft made available.
It works very well, and it’s the way I’ve handle sorting by # comments and ratings on Django Sites.
October 24th, 2007 at 9:19 pm
Nice post. Extra is really useful but sometimes takes some work to get it to do what you want. I agree with Ross, I would likely denomalize this, especially if I expected to generate any significant traffic.
October 25th, 2007 at 6:09 am
Your website is very niceI just wish I could pick it apart and find all the little faults in it that you never thought about. Like you do to my things. But I can’t
October 25th, 2007 at 5:14 pm
Not sure that you want to be doing an AVG and/or COUNT on every request – might be better (at a certain point) to start using summary tables. Thanks for the post, though!
October 25th, 2007 at 6:16 pm
@Matt –
Actually, last night I worked on denormalizing the tables and using a post.save signal, whenever a Review is added, the Restaurant instance associated saves itself which in turn updates a few columns, one being called “average_review” (just as Ross/Jeff suggested). Thanks for the comment though – I love all the help the community provides; especially for us techies in remote places.
October 1st, 2010 at 1:49 pm
BTW, fix your code formatting. You’re escaping HTML entities twice.
September 29th, 2012 at 12:59 am
How did you output the number of queries per page?
Please leave a reply »