Calling all (Django) programmers… (I need advice)
Update
According to Max, it’s against Facebook’s TOS to cache the data. I’ve totally must’ve skipped over it in the TOS… ok, fine… I admit, I didn’t even read it. Thanks Max! And to think this response was less than an hour after my post! Yay, community. In any case, this problem is still relevant to many a graph-structured sites. Comment on!
Since I don’t know of many techies in Hawaii, I figured I would post a blog in hopes of attracting a solution from the akamai Django community.
The Background
I’m currently writing a Facebook application in Django (which I’ll be blogging about later).
The Problem
I want to be able to programatically tell if you’ve updated your friends in Facebook. Basically, at the end of the day, Facebook provides an API call that allows you to retrieve a list of uid’s that represent your ‘friends’. I’m planning to cache this data within my own system. I was wondering how Pythonic people would solve the following:
Is there a data type or algorithm (or combination thereof) I can use to compare the returned uids with my uid data cache? This comparison algorithm will tell me if you’ve added or subtracted new friends in Facebook. As a bonus, I’d prefer this comparison optimized (for Python).
Possible Solutions
So I’ve sat down and made a list of possible solutions:
- Creating an in memory hash from my cached data and seeing which Facebook API returned uids collide and which don’t. (EASY)
- Using SET comparison in MySQL. (EASY)
- Possibly encoding the uids in a way to optimize the comparison of two sets; though I’d have to research this. (EW)
- …? (Your input goes here)
Btw, I’d never thought I’d say this, but without CLR, I feel so lost. ;P
Like I wrote in my last post, a few years out of grad school and a bunch of web applications later, and I’m really, really rusty. In any case, feel free to leave any comments/ideas and I’ll be sure to credit you for your solution!







October 11th, 2007 at 12:22 am
I hate to burst bubbles, but you realize that you are not allowed to cache friend IDs? It’s against the current Facebook API terms and conditions. I’m betting some do it anyway, but Facebook sees it’s bread and butter as master repository of that friend knowledge and are apparently trying to be pretty strict about it. You should review the list of things you can and can’t cache, it’s quite enlightening. Most API pages also discuss the information from that API call that can or can’t be cached.
They want you to query for it everytime… I think you can cache API calls for up to 24 hours to avoid calling *too* often, but storing things long term in your database is generally a big NO.
That said, I would probably use a Python list and some simple in/not in Logic, but I might also use it as a excuse to play more with Python’s sets:
http://docs.python.org/lib/types-set.html
October 11th, 2007 at 4:37 am
I don’t know about the legal issues Max mentioned, but you might look into Python sets. As far as I can tell, it does all the hard work for you.
http://docs.python.org/lib/types-set.html
October 11th, 2007 at 7:47 am
Well you can’t cache the IDs themselves per the TOS, but what about storing a sum of each user’s friend IDs? Then you can check to see if the sum has changed.
I’m itching to do some more Django/Facebook work. Fun stuff.
October 16th, 2007 at 12:12 pm
Don’t get me wrong, I love Python sets.
I was just curious to see of any unique solutions ie, along the trail of Ryan’s.
October 18th, 2007 at 6:15 pm
I love Ryan’s proposal. Make sure you sort the ids. Thinking aloud in code…
import md5
def friendlist_hash(friends):
“”"Returns a 32-character digest based on `friends`, a
sequence of friend identifiers for which str() works.”"”
friendlist = list(friends)
friendlist.sort()
friendlist_text = ‘, ‘.join(”%s” % f for f in friendlist)
return md5.md5(friendlist_text).digest().encode(’hex’)
Please leave a reply »