Ryan Kanno: The diary of an Enginerd in Hawaii

Everything you've ever thought, but never had the balls to say.

My LinkedIn Profile
Follow @ryankanno on Twitter
My Feed

Tag Archive » ‘Python’

Lessons Learned: Google App Engine + App Engine Patch + Django + Boto

Update:

Mitch Garnaat from CloudRight has pointed out that you can actually set the policy of the S3 file in the set_contents_from_file call instead of making another roundtrip request into S3 (and saving you some coin). Thanks Mitch!

Btw, I’m using App Engine Patch 1.0 and Boto 1.6a.


Sorry I haven’t updated my blog in a few weeks months, but I’ve been a little busy. With that said, along with Erlang, I’ve been playing around with Google App Engine, App Engine Patch (for Django support), and the Boto library (for Amazon S3 support). After not having touched Python code in a few months, I wanted to document some of my lessons learned to help over developers who may be in a similar boat.

Lessons learned

  • If you’re upgrading the App Engine Patch, make sure you don’t have the App Engine library installed in a hidden directory
  • Uploading bulk data changed ever so slightly
  • If you’re not running off of Boto’s trunk, you’ll need to patch your Boto installation to work with App Engine.

Make sure the App Engine library isn’t installed in a hidden directory

Apparently, Google’s SDK 1.1.9 doesn’t like to rely on files that won’t be uploaded with your application – and hidden directories are no longer uploaded. I was running into the dreaded purple-nurple screen of death. Thank goodness for this AppEngine Google Group post, but I’m still not even sure when this popped up considering Google’s articles still refer to this setup.

Bulk upload

Compared to the previous SDK I was playing around with, bulk uploading changed significantly. I recall having to patch Goog’s bulkupload.py file to get unicode support. However, their new remote api tool has definitely fixed this issue, so +1 for Googs. People are reporting that uploading unicode is still broken, but it’s not. Or at least it wasn’t for me. Second, if you’re like me and don’t read documentation, you’ll find out (the hard way) that the method signature to HandleEntity changed. Instead of accepting a datastore.Entity, it’s now expecting a db.Model object.

Note: When actually running the remote api tool, you’ll also want to make sure your PYTHONPATH includes your current project. (Another one-liner in the documentation. :P )

Integrating Boto + App Engine

I wasn’t running off of Boto’s trunk and I was getting an obscure type conversion error. Being too lazy to check out the source, I jumped to their issue tracker and found a patch (halfway down the page) by one of the App Engine Patch lead devs. Apply the patch and you’ll be on your way to uploading images/data from App Engine into Amazon S3! If you’re looking for example code, I’ve included a small snippet of what I tested.

1
2
3
4
5
6
7
8
9
10
11
    @staticmethod
    def upload_to_s3(original_filename, photo):
        """ Upload a photo file, storing its original name as metadata in an S3 bucket """
        connection = Connection(settings.AWS_ACCESS_KEY_ID, settings.AWS_SECRET_ACCESS_KEY)
        bucket = connection.get_bucket(settings.AWS_IMAGE_BUCKET_NAME)
        photo_uuid = str(uuid.uuid4())
        new_key = Key(bucket)
        new_key.key = photo_uuid
        new_key.set_metadata('original_filename', original_filename)
        new_key.set_contents_from_file(photo, policy='public-read')
        return photo_uuid

Note: I only tested the code above with small images ~300-500K in size and it seemed to work perfectly fine (with no load! :P ). As always, feel free to use, steal, take, and/or copy anything on this blog. Hopefully somewhere, someone on the Interwebs will find these tips handy!

Enjoy!

Popularity: 22% [?]

Tagged: , , , , , .


web2email.py – A web to email Python backup script

I’m back, at least for the time being. There’s definitely a calm before the impending storm, but until then, I’m back posting little tidbits of uselessness. Enjoy!

Python goodness

While introducing the concept of automation to a friend of mine, I came across a requirement to archive a series of URL’s on a daily basis. Luckily for me, the URL’s consisted primarily of plain text. Loading up VIM, I concocted this Python script in a few hours – most of which was spent searching Googs <3.

If you're looking for a true web crawler, this won't be for you - though loading up lxml/Beautiful Soup, cssutils, and a Javascript parser to determine what artifacts need to be downloaded shouldn’t be all that difficult…

But, I’ll leave that as an exercise for the reader (That’s you, btw!)

In any case, the following script crawls a URL and sends the page via Googs or Webfaction via SMTP-AUTH or via a plain SMTP server of your choosing. Sorta-kinda like having your own WayBackMachine. In any case, cut and paste the following into a neat file called web2email.py.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
#! /usr/bin/env python2.5
# -*- coding: utf-8 -*-
#
# Copyright (c) 2008 Ryan Kanno (ryankanno@localkinegrinds.com)
# License: GNU GPLv3
 
import urllib2
 
import smtplib
from email.MIMEMultipart import MIMEMultipart
from email.MIMEBase import MIMEBase
from email.MIMEText import MIMEText
from email.Utils import COMMASPACE, formatdate
from email import Encoders
import datetime
 
from optparse import OptionParser
import sys, logging
 
__doc__ = """
 
This script retrieves a URL and sends its contents via email to 
a list of recipients.  Typically, this script is run from a cron
job that sends emails to a Gmail account to archive the contents
of a URL.
 
Mail can be sent via normal or authenticated SMTP.  Tested using 
Gmail SMTP (authenticated), Webfaction SMTP (authenticated), and
localhost (normal).
 
Example:
 
Sends the contents of http://www.espn.com to friend@domain.com using your Gmail settings
 
    python web2email.py -u gmail_username \
                        -p gmail_password \
                        -f gmail_username@gmail.com \
                        -r friend@domain.com http://www.espn.com
 
Sends the contents of http://www.espn.com to friend@domain.com using your Webfaction settings
 
    python web2email.py -u webfaction_username \
                        -p webfaction_password \
                        -f webfaction_account@webfaction_domain.com \
                        -s smtp.webfaction.com \
                        -r friend@domain.com http://www.espn.com
 
Sends the contents of http://www.espn.com to friend@domain.com using your local settings
 
    python web2email.py -f your_email@domain.com \
                        -s localhost \
                        --port 25 \
                        -r friend@domain.com http://www.espn.com
"""
 
__author__  = "ryankanno@localkinegrinds.com"
__url__     = "http://blog.localkinegrinds.com"
__version__ = "0.1"
 
USAGE = "usage: %prog [options] url" 
DESC  = __doc__.split('\n\n')[0]
 
def configure_logging(log_level, format='%(asctime)s %(levelname)s %(message)s'):
    logging.basicConfig(level=log_level, format=format)
 
def _validate_options_and_args(parser, options, args):
    logging.debug("Validating options and arguments.")
    if (len(args) != 1):
        parser.error("Incorrect number of arguments.  Script expects 1 (URL to backup), but received %i." % len(args))
        sys.exit(2) # Command line syntax error
    elif not options.recipients: 
        parser.error("You must include at least one recipient.")
        sys.exit(1) 
    elif (options.username and options.password is None) or (options.username is None and options.password is not None):
        parser.error("You must include both a username and password.")
        sys.exit(1) 
    elif not options.from_email:
        parser.error("You must include a valid from email address.")
        sys.exit(1) 
 
def getPage(url):
    logging.debug("Attempting to retrieve %s" % url)
    try:
        response = urllib2.urlopen(url)
        return response.read()
    except urllib2.HTTPError, e:
        logging.error("HTTPError (%s) occurred retrieving %s" % (e.code, url))
        sys.exit(1)
    except urllib2.URLError, e:
        logging.error("URLError (%s) occurred retrieving %s" % (e.reason, url))
        sys.exit(1)
 
def mail(send_from, send_to, subject, text, content_type, files=[], server='localhost', port=25, username=None, password=None):
 
    def _auth(server, port, username, password):
        logging.debug("Attempting to send email via %s:%i using the following credentials (%s:%s)." % (server, port, username, password))
        smtp = smtplib.SMTP(server, port) 
        smtp.ehlo()
        smtp.starttls()
        smtp.ehlo()
        smtp.login(username, password)
        smtp.sendmail(username, send_to, msg.as_string())
        smtp.close()
 
    def _unauth(server, port):
        logging.debug("Attempting to send email via %s:%i" % (server, port))
        smtp = smtplib.SMTP(server, port)
        smtp.sendmail(send_from, send_to, msg.as_string())
        smtp.close()
 
    assert type(send_to)==list
 
    msg=MIMEMultipart()
    msg['From'] = send_from
    msg['To'] = COMMASPACE.join(send_to)
    msg['Date'] = formatdate(localtime=True)
    msg['Subject'] = subject
 
    text = MIMEText(text)
    text.set_type(content_type)
    text.set_param('charset', 'UTF-8')
 
    msg.attach(text)
 
    for f in files:
        part = MIMEBase('application', "octet-stream")
        part.set_payload(open(file,"rb").read())
        Encoders.encode_base64(part)
        part.add_header('Content-Disposition', 'attachment; filename="%s"' % os.path.basename(f))
        msg.attach(part)
 
    if not username and not password:
        _unauth(server, port)
    else:
        _auth(server, port, username, password) 
 
def main():
    parser = OptionParser(usage=USAGE, description=DESC)
 
    parser.add_option("-u", "--username", dest="username", metavar="USER", help="Username to SMTP server")
    parser.add_option("-p", "--password", dest="password", metavar="PWD", help="Password to SMTP server")
    parser.add_option("-s", "--server", dest="server", metavar="SERVER", help="SMTP server (Defaults to Gmail)")
    parser.add_option("--port", dest="port", metavar="PORT", type="int", help="SMTP server port (Defaults to Gmail)")
    parser.add_option("-f", "--from", dest="from_email", metavar="FROM", help="From address")
    parser.add_option("-r", "--recipient", action="append", dest="recipients", metavar="RCPT", type="string", help="Email recipient")
    parser.add_option('-t', '--test', action="store_true", dest="test", metavar="TEST", help="Run tests")
    parser.add_option('-v', '--verbose', action='store_const', dest='log_level', const=logging.DEBUG, help='Verbose output')
    parser.set_defaults(server="smtp.gmail.com", port=587, test=False, log_level=logging.INFO)
    (options, args) = parser.parse_args()
 
    _validate_options_and_args(parser, options, args)
    configure_logging(options.log_level)
 
    if options.test:
        _test() # Too lazy to write a test for this script.  @TODO - use mocks 
 
    # Retrieve URL and return html
    html = getPage(args[0])
 
    # Send mail with returned html as body 
    mail(options.from_email, options.recipients, 
         '%s @ %s' % (args[0], (datetime.datetime.now().strftime("%A %B %d %I:%M:%S %p %Y"))), 
         html, 'text/html', 
         server=options.server, port=options.port, username=options.username, password=options.password)
 
    # Return with appropriate exit code
    sys.exit(0)
 
def _test():
    import doctest
    doctest.testmod(sys.modules[__name__])
 
if __name__ == '__main__':
    main()

All right stop, cron time! (imagine a 90’s pop song)

As an added bonus, you can install this script to run via cron so you’ll magically end up with webpages archived in your inbox! Neat. You can read my previous post on cron, or you can create the following crontab.

MAILTO=ryankanno@CHANGE_TO_YOUR_EMAIL.com
# minute (0-59),
# |      hour (0-23),
# |      |       day of the month (1-31),
# |      |       |       month of the year (1-12),
# |      |       |       |       day of the week (0-6 with 0=Sunday).
# |      |       |       |       |       commands
  0      0       *       *       *      /usr/bin/python2.5 /PATH/TO/web2email.py -u GMAIL_USER -p GMAIL_PWD -f FROM_USER -r RECIPIENT URL

As a side note, don’t forget double quotes around URL if there’s spaces!

Notice, change the value of ryankanno@CHANGE_TO_YOUR_EMAIL.com to your email address (or comment the line out with a # if you don’t want emails sent to you), GMAIL_USER to your Google username, GMAIL_PWD to your Google password, FROM_USER to the from address in the mail header, RECIPIENT to the recipient email address, and URL to the URL you want backed up.

I know, I know. The critics.

The critics will say that your Gmail username and password are in cleartext. I know. They are. So… I’m hoping that since you just need an archive of a publicly available URL on the Internets, the data doesn’t need to be super-duper-Fort-Knox-protected. If it does, this script isn’t for you. :( Oh, yeah, before I forget… here’s a hint… *cough*create another Google account*cough*. With that said, archive to your heart’s content!

Enjoy!

Popularity: 33% [?]

Tagged: , , , , , .


Google App Engine on Win2K (using django-yui-layout-templates)

Update : September 1, 2008

I guess Googs finally caught on as their 1.1.2 installer works on Win2K! FTW!


Update

After finally getting time to play around with the Google App Engine Django helpers, here’s a few more steps to integrate nicely with the helper suite.

  • Move the appengine installation from C:\AppEngine\ to where the Windows installer would have installed it to: C:\Program Files\Google\google_appengine (make sure to clean up your .pyc files)
  • Add the following to your PYTHONPATH system variable: %APPENGINE%\;%APPENGINE%\lib;%APPENGINE%\lib\yaml\lib;%APPENGINE%\lib\webob;

After following the instructions, you should be good to go with Django + AppEngine! FTW! Whee. :)


So I finally get an hour or so to play around with the Googs App Engine and luckily for me, all my machines decided to puke except for my Windows 2000 Server. How ironic is that? In disbelief, I downloaded the Google App Engine SDK Windows installer and what do I get?

Google App Engine Windows installer

I sense some pure, unadultered haterade. (j/k)

Since Python is one of those insert_any_synonym_for_fun languages that just works, here’s how to get the Google App Engine SDK working in Win2K.

  • Download the Linux/Other platform package and unzip to somewhere neat.
  • Add a System Environment variable called ‘APP_ENGINE_HOME’ that points to your App Engine installation. (Notice, I installed mine into C:\AppEngine)

    Add system variable

  • Add the System Environment variable to your System Path so the Windows shell can execute the included Python files.
  • Make sure you have .py files associated with the python.exe executable located in your Python installation. (Check file types under folder options)
  • Follow the tutorials: here and here, or learn with others – just to name a few.
  • Oh, and before I forget, if you develop an application and realize that you can’t kill the development appserver (dev_appserver.py) by pressing Ctrl-C, I found a solution here. Basically, press Ctrl-C, hit the server with your browser one more time and voila, the development application server dies. Thanks Frank!

As an added bonus…

Checkout my my previous post using the Yahoo UI library to create a set of default Django templates. I’ve updated django-yui-layout-templates with patches and suggestions, and I’ve also created a few branches to support the Googly App Engine. Check out the branches directory in the Subversion repository!

Last but not least…

Big ups to Mr. Fitz for solving all my Google App Engine issues and thanks to Mr. Harper for causing them. ;)

Voila! (Enjoy)

Popularity: 23% [?]

Tagged: , , , , , , , .


Yahoo! UI (YUI) + Django templates == Google Code project! FTW!

Let me first preface this blog by saying that I’m not a designer. When it comes to art and creativity, I’m so left brained, I actually wonder if my right brain even partakes in the process.

Three things spurred me to release django-yui-layout-templates.

  1. I’ve always wanted to see what GoogleCode offered in relation to SourceForge / RubyForge.
  2. I’m so caught up in corporate America staring at Java / Ruby code all day, not only haven’t I blogged about anything Django related in quite a while, but it’s nice to get some commentary from the community, i.e. “your code sucks”. (Brings me back to reality)
  3. I found myself using the same templates on a variety of projects and figured that I could do my part and help eliminate unncessary cruft/duplication.

So without further adieu, check out the project here. I know, I know – nothing revolutionary here, but I figure since Django is picking up some steam, these templates might help a Djangonaut get a head start on their next million dollar idea. :)

Voila! Enjoy!

Popularity: 32% [?]

Tagged: , , , , , , , , , .


Using the extra() QuerySet modifier in Django for WeGoEat

Since I actually used this method to reduce the number of Update:”explicit” SQL calls made in WeGoEat, I figured I’d write a little blog explaining the context in which it was used, and maybe, just maybe, it’ll help shed some light on how others can take advantage of this neat little function.

Background

As a Django “proof-of-concept”, I’m working on a local restaurant review site for my home state of Hawai`i. (I actually just released it yesterday). For each restaurant, I want to be able to calculate the average of all reviews and display this listing in a paginated view. (Yes, I do realize there’s no average rating, but that has to do with there being no users. ;P).

The Problem

Having a serious “wtf was I thinking moment”, I initially wrote a Restaurant model function that returned the average (review) rating for each restaurant instance. Little did I realize that when I actually displayed the restaurant’s average reviews, I would be making an additional SQL avg() call for every restaurant. Though I’m paging “n” records at a time, this function added an additional “n” SQL calls for every view that contained a restaurant listing, just to name a few.

In pseudo-code, my initial naive function resembled the following: (I’m sure we’re all guilty of writing something of the sort… ok, fine, I know I was. ;P)

1
2
3
4
5
6
     def get_average_review(self):
         query = 'QUERY TO GET AVERAGE (SELECT AVG(rating)...); (I have the query below)'
         # Get cursor from connection
        cursor = connection.cursor()
        cursor.execute(query)
        return cursor.fetchall()

Duh.

Here’s a picture of the number of queries it took:

Duh

The “extra()” solution

After profiling my application and realizing what a bone-headed mistake I made, I began researching the extra() Queryset modifier. Yes, I realize that these extra lookups aren’t the most portable and often violate the DRY principle, but it’ll probably suffice for most of all my personal projects. :)

Since I’m already retrieving a list of Restaurants and filtering them via letter, island, and what not, I figured I could add an average rating subquery. The entire call looks as such:

1
2
3
4
5
6
7
     restaurants = Restaurant.objects.filter(name__istartswith = letter).extra(
             select={'<strong>avg_rating</strong>': 'SELECT AVG(overall_rating) FROM restaurants_restaurant as res, reviews_review, django_content_type \
                                          WHERE restaurants_restaurant.id = res.id \
                                          AND res.id = reviews_review.object_id \
                                          AND reviews_review.content_type_id = django_content_type.id \
                                          AND django_content_type.model = \'restaurant\''},
                       )

As you can see, I’m exploiting the fact that restaurants_restaurant will be available from the Restaurant.objects.filter() call. (I know, I know… bad for portability).

But voila!

Now, in my templates, when I iterate over the restaurants, I can get issue the following:

1
2
3
4
5
6
7
8
9
10
11
{% for restaurant in restaurant_list %}
&lt;tr&gt;
    &lt;td&gt;&lt;a href="{{restaurant.get_absolute_url}}"&gt;{{ restaurant.name }}&lt;/a&gt;&lt;/td&gt;
    &lt;td&gt;{% if restaurant.avg_rating %}
	   {% load show_stars %} 
           &lt;span class="average-rating"&gt;
	   {% show_stars <strong>restaurant.avg_rating</strong> of 5 round to quarter %}
           &lt;/span&gt;
           {% endif %}&lt;/td&gt;
&lt;/tr&gt;
{% endfor %}

Notice how I used my show_stars template tag that I blogged about a few weeks ago to display the average restaurant rating. (Cheap shameless plug, but damn effective! :P ) I’d link to a page in action, but since I just opened up my site to a few select users, I’ll update this post when I actually have any reviews. :P

Oh, and before I forget, thanks to my co-worker Stephen for assisting me with my SQL issues! :)

Here’s a picture of the final result:

Yay

Note:

As an added bonus, I also realized a few other ’spots’ where the .extra() Queryset modifier would come in handy. Since I’m also using the wonderful django-voting application from Jonathan Buchanan, I came across this post about accessing a dictionary via a template in the Django-users Google Group.

Basically, I had come across the same issue as the poster. Since I allow users to vote on reviews (similar to Amazon, Yelp, etc.), I wanted to retrieve the score of each Review instance to display on a paginated listing of all Reviews. Using the same extra() modifier, I was able to inject the total number of votes and the score when I retrieved all Reviews as such:

Btw, I just injected most of the code from Jonathan’s template tag. :)

1
2
3
4
5
6
7
8
9
10
11
.extra(select={'total_votes': 'SELECT COUNT(vote) FROM votes as v, reviews_review as rev, django_content_type \
                                        WHERE reviews_review.id = rev.id \
                                        AND v.object_id = reviews_review.id \
                                        AND v.content_type_id = django_content_type.id \
                                        AND django_content_type.model = \'review\'', 
 
                                        'score': 'SELECT SUM(vote) FROM votes as v, reviews_review as rev, django_content_type \
                                        WHERE reviews_review.id = rev.id \
                                        AND v.object_id = reviews_review.id \
                                        AND v.content_type_id = django_content_type.id \
                                        AND django_content_type.model = \'review\''},)

Pretty neat right?

Now, when iterating through the reviews, I can use the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
{% for review in object_list %}
	&lt;tr&gt;
		&lt;td&gt;&lt;a href="{{review.content_object.get_absolute_url}}"&gt;{{ review.content_object.name }}&lt;/a&gt;&lt;/td&gt;
		&lt;td&gt;&lt;a href="{% url profile-detail username=review.user.username %}"&gt;{{ review.user.username }}&lt;/a&gt;&lt;/td&gt;
		&lt;td&gt;&lt;nobr&gt;{% load show_stars %}
			&lt;span class="rating"&gt;{% show_stars review.overall_rating of 5 round to half %}&lt;/span&gt;
			&lt;/nobr&gt;
		&lt;/td&gt;
		&lt;td&gt;"&lt;span style="font-weight:bold; color:#092e20;"&gt;{{ review.get_recommendation_display }}&lt;/span&gt;"&lt;/td&gt;
		&lt;td&gt;&lt;span style="font-size:.875em;"&gt;{{ review.submit_date|timesince }} ago&lt;/span&gt;&lt;/td&gt;
		<strong>&lt;td&gt;Total of {{ review.score|default:0 }} from {{ review.total_votes }} {{  review.total_votes|pluralize:"person,people" }}.&lt;/td&gt;</strong>
	&lt;/tr&gt;
{% endfor %}

Hope y’all learned something like I did! :) Oh, and before I forget my standard disclaimer, “since this is on my blog, feel free to take/use/steal/distribute/copy/modify any code you see fit, but if you find any bugs, have any comments, or think the code can be cleaner, I’d love to hear from you.”

Enjoy!

Popularity: 44% [?]

Tagged: , , , , , , .


Powered by Wordpress. Stalk me.