Ryan Kanno: The diary of an Enginerd in Hawaii

Everything you've ever thought, but never had the balls to say.

My LinkedIn Profile
Follow @ryankanno on Twitter
My Feed

Using the extra() QuerySet modifier in Django for WeGoEat

Since I actually used this method to reduce the number of Update:”explicit” SQL calls made in WeGoEat, I figured I’d write a little blog explaining the context in which it was used, and maybe, just maybe, it’ll help shed some light on how others can take advantage of this neat little function.

Background

As a Django “proof-of-concept”, I’m working on a local restaurant review site for my home state of Hawai`i. (I actually just released it yesterday). For each restaurant, I want to be able to calculate the average of all reviews and display this listing in a paginated view. (Yes, I do realize there’s no average rating, but that has to do with there being no users. ;P).

The Problem

Having a serious “wtf was I thinking moment”, I initially wrote a Restaurant model function that returned the average (review) rating for each restaurant instance. Little did I realize that when I actually displayed the restaurant’s average reviews, I would be making an additional SQL avg() call for every restaurant. Though I’m paging “n” records at a time, this function added an additional “n” SQL calls for every view that contained a restaurant listing, just to name a few.

In pseudo-code, my initial naive function resembled the following: (I’m sure we’re all guilty of writing something of the sort… ok, fine, I know I was. ;P)

1
2
3
4
5
6
     def get_average_review(self):
         query = 'QUERY TO GET AVERAGE (SELECT AVG(rating)...); (I have the query below)'
         # Get cursor from connection
        cursor = connection.cursor()
        cursor.execute(query)
        return cursor.fetchall()

Duh.

Here’s a picture of the number of queries it took:

Duh

The “extra()” solution

After profiling my application and realizing what a bone-headed mistake I made, I began researching the extra() Queryset modifier. Yes, I realize that these extra lookups aren’t the most portable and often violate the DRY principle, but it’ll probably suffice for most of all my personal projects. :)

Since I’m already retrieving a list of Restaurants and filtering them via letter, island, and what not, I figured I could add an average rating subquery. The entire call looks as such:

1
2
3
4
5
6
7
     restaurants = Restaurant.objects.filter(name__istartswith = letter).extra(
             select={'avg_rating': 'SELECT AVG(overall_rating) FROM restaurants_restaurant as res, reviews_review, django_content_type \
                                          WHERE restaurants_restaurant.id = res.id \
                                          AND res.id = reviews_review.object_id \
                                          AND reviews_review.content_type_id = django_content_type.id \
                                          AND django_content_type.model = \'restaurant\''},
                       )

As you can see, I’m exploiting the fact that restaurants_restaurant will be available from the Restaurant.objects.filter() call. (I know, I know… bad for portability).

But voila!

Now, in my templates, when I iterate over the restaurants, I can get issue the following:

1
2
3
4
5
6
7
8
9
10
11
{% for restaurant in restaurant_list %}
<tr>
    <td><a href="{{restaurant.get_absolute_url}}">{{ restaurant.name }}</a></td>
    <td>{% if restaurant.avg_rating %}
	   {% load show_stars %} 
           <span class="average-rating">
	   {% show_stars <strong>restaurant.avg_rating</strong> of 5 round to quarter %}
           </span>
           {% endif %}</td>
</tr>
{% endfor %}

Notice how I used my show_stars template tag that I blogged about a few weeks ago to display the average restaurant rating. (Cheap shameless plug, but damn effective! :P ) I’d link to a page in action, but since I just opened up my site to a few select users, I’ll update this post when I actually have any reviews. :P

Oh, and before I forget, thanks to my co-worker Stephen for assisting me with my SQL issues! :)

Here’s a picture of the final result:

Yay

Note:

As an added bonus, I also realized a few other ‘spots’ where the .extra() Queryset modifier would come in handy. Since I’m also using the wonderful django-voting application from Jonathan Buchanan, I came across this post about accessing a dictionary via a template in the Django-users Google Group.

Basically, I had come across the same issue as the poster. Since I allow users to vote on reviews (similar to Amazon, Yelp, etc.), I wanted to retrieve the score of each Review instance to display on a paginated listing of all Reviews. Using the same extra() modifier, I was able to inject the total number of votes and the score when I retrieved all Reviews as such:

Btw, I just injected most of the code from Jonathan’s template tag. :)

1
2
3
4
5
6
7
8
9
10
11
.extra(select={'total_votes': 'SELECT COUNT(vote) FROM votes as v, reviews_review as rev, django_content_type \
                                        WHERE reviews_review.id = rev.id \
                                        AND v.object_id = reviews_review.id \
                                        AND v.content_type_id = django_content_type.id \
                                        AND django_content_type.model = \'review\'', 
 
                                        'score': 'SELECT SUM(vote) FROM votes as v, reviews_review as rev, django_content_type \
                                        WHERE reviews_review.id = rev.id \
                                        AND v.object_id = reviews_review.id \
                                        AND v.content_type_id = django_content_type.id \
                                        AND django_content_type.model = \'review\''},)

Pretty neat right?

Now, when iterating through the reviews, I can use the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
{% for review in object_list %}
	<tr>
		<td><a href="{{review.content_object.get_absolute_url}}">{{ review.content_object.name }}</a></td>
		<td><a href="{% url profile-detail username=review.user.username %}">{{ review.user.username }}</a></td>
		<td><nobr>{% load show_stars %}
			<span class="rating">{% show_stars review.overall_rating of 5 round to half %}</span>
			</nobr>
		</td>
		<td><span style="font-weight:bold; color:#092e20;">{{ review.get_recommendation_display }}</span></td>
		<td><span style="font-size:.875em;">{{ review.submit_date|timesince }} ago</span></td>
		<strong><td>Total of {{ review.score|default:0 }} from {{ review.total_votes }} {{  review.total_votes|pluralize:"person,people" }}.</td></strong>
	</tr>
{% endfor %}

Hope y’all learned something like I did! :) Oh, and before I forget my standard disclaimer, “since this is on my blog, feel free to take/use/steal/distribute/copy/modify any code you see fit, but if you find any bugs, have any comments, or think the code can be cleaner, I’d love to hear from you.”

Enjoy!

Be Sociable, Share!
  1. Nice article, one notice though, in django-voting you can use special tag for your task:
    First, get all scores for objects on page:
    {% scores_for_objects obj_list as score_dict %}
    Then, assign every vote from dict to object in loop:
    {% dict_entry_for_item obj from score_dict as score %}

    This makes less SQL-queries and keeps code more portable, dry (other blah-blah-blah :) ).

  2. @Dmitry:

    I think scores_for_objects calls Vote.objects.get_scores_in_bulk(objects) – that returns a dictionary. So you could iterate through this dictionary, but for every object, I would think you would have to iterate over every item in the dictionary…

    {% scores_for_objects obj_list as score_dict %}
    {% for object in obj_list %}…
    {% for key,value in score_dict.items %}
    // check if it is equal to object, then print out score.
    {% endfor %}
    {% endfor %}

    But then again, I could be seriously wrong about that. ;) Could you elaborate?

    I do know this doesn’t make any less/more SQL-queries since the query is the same one I took my subquery from, but you are right in that it is definitely more portable/dry because Jonathan’s a smart dude. :)

    Update: @Dmitry: Actually, I was wrong (and you were right), that template tag *does* make less SQL calls – I’ll profile the AVG(), GROUP_BY later today, but I think the processing time would be moved into the template at that point, i.e. http://www.mail-archive.com/django-users@googlegroups.com/msg36615.html. Thanks for the comment! (and setting me straight). =)

  3. Is this actually an improvement? I mean you are still making all those sql calls with the subquery. I feel like doing a join and then using the AVG() function would make it so it is truly 1 call. But I don’t know enough about databases to know which one is more efficient.

  4. @Benjamin:

    I think that at the end of the day, this is an improvement (at least performance wise) – but I can see how it is arguably so. I agree that the SQL is being executed with the subquery, but I think you have to look at where the query is being optimized.

    In the subquery extra() clause, the database is handling the optimization for you; ie we’re letting the database do what it’s good at. As to how optimized the MySQL query analyzer is for its B-tree is up for debate (I’m no database expert). I think letting the database handle this versus making “N” number of MySQL AVG() calls (while retrieving connections, etc) is a definite improvement performance-wise.

    However, as I said earlier, I think it can be debated since this type of SQL injection obviously leads to maintenance hell in large projects. :) But then again, since I’m no database guru, this could all be total bs. :P As for your join and AVG() function call, maybe someone can pitch in and set us me straight. :)

    Update: @Benjamin: Read my previous comment as I think your comment was where the templatetag is going.

  5. It is definitely an improvement. But showing it as a good solution and blog about it is not. This SQL still makes subselects, for every single row. The real problem is Django in this case, we should focus on that, instead of praising workarounds which circumvent real SQL solutions.

  6. I definitely agree that the SQL makes subselects for every row.

    Since I’m no database guru, I’ll profile it under SQL analyzer to see how bad/optimized it is. Yet, I don’t think Django is the problem here at all. If anything, it’s me as the developer, and if you were trying to be nice about it… Thanks! :) If I knew the optimized SQL to write, I would. ;)

    Maybe the optimal thing to do is to perform the query to retrieve the objects, then perform the average call, then merge the data QuerySet’s together… :P

  7. That’s a neat use of the extra() function; however I believe this may be a case for database de-normalisation. The most efficient way of doing this is probably to add an ‘avg_rating’ field to your Restaurant model, and use a signal so that whenever a review is saved, you re-calculate the average rating for that Restaurant and save it directly on the model.

    There is a huge upside to this de-normalised method: You can use sort_by() or limit() to only show restaurants that have a rating above 4, or sort them by rating.

    Lastly, it should produce more portable code, as subselects may or may not work the same way on different database servers.

  8. @Ross -

    By far the best suggestion yet! :) I hadn’t even begun to think of denormalizing the database at all! +1 +100 for Ross. :P I’ll be working on the signal tonight. Gosh, community rocks!

  9. :)

    I can’t take all the credit for it, the idea first came into my head when I saw it in the Lost Theories source code that Jeff Croft made available.

    It works very well, and it’s the way I’ve handle sorting by # comments and ratings on Django Sites.

  10. Nice post. Extra is really useful but sometimes takes some work to get it to do what you want. I agree with Ross, I would likely denomalize this, especially if I expected to generate any significant traffic.

  11. Your website is very niceI just wish I could pick it apart and find all the little faults in it that you never thought about. Like you do to my things. But I can’t

  12. Not sure that you want to be doing an AVG and/or COUNT on every request – might be better (at a certain point) to start using summary tables. Thanks for the post, though!

  13. @Matt –

    Actually, last night I worked on denormalizing the tables and using a post.save signal, whenever a Review is added, the Restaurant instance associated saves itself which in turn updates a few columns, one being called “average_review” (just as Ross/Jeff suggested). Thanks for the comment though – I love all the help the community provides; especially for us techies in remote places. :)

  14. BTW, fix your code formatting. You’re escaping HTML entities twice.

  15. How did you output the number of queries per page?

Please leave a reply »

Powered by Wordpress. Stalk me.