Ryan Kanno: The diary of an Enginerd in Hawaii

Everything you've ever thought, but never had the balls to say.

My LinkedIn Profile
Follow @ryankanno on Twitter
My Feed

Tag Archive » ‘backup’

web2email.py – A web to email Python backup script

I’m back, at least for the time being. There’s definitely a calm before the impending storm, but until then, I’m back posting little tidbits of uselessness. Enjoy!

Python goodness

While introducing the concept of automation to a friend of mine, I came across a requirement to archive a series of URL’s on a daily basis. Luckily for me, the URL’s consisted primarily of plain text. Loading up VIM, I concocted this Python script in a few hours – most of which was spent searching Googs <3.

If you're looking for a true web crawler, this won't be for you - though loading up lxml/Beautiful Soup, cssutils, and a Javascript parser to determine what artifacts need to be downloaded shouldn’t be all that difficult…

But, I’ll leave that as an exercise for the reader (That’s you, btw!)

In any case, the following script crawls a URL and sends the page via Googs or Webfaction via SMTP-AUTH or via a plain SMTP server of your choosing. Sorta-kinda like having your own WayBackMachine. In any case, cut and paste the following into a neat file called web2email.py.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
#! /usr/bin/env python2.5
# -*- coding: utf-8 -*-
#
# Copyright (c) 2008 Ryan Kanno (ryankanno@localkinegrinds.com)
# License: GNU GPLv3
 
import urllib2
 
import smtplib
from email.MIMEMultipart import MIMEMultipart
from email.MIMEBase import MIMEBase
from email.MIMEText import MIMEText
from email.Utils import COMMASPACE, formatdate
from email import Encoders
import datetime
 
from optparse import OptionParser
import sys, logging
 
__doc__ = """
 
This script retrieves a URL and sends its contents via email to 
a list of recipients.  Typically, this script is run from a cron
job that sends emails to a Gmail account to archive the contents
of a URL.
 
Mail can be sent via normal or authenticated SMTP.  Tested using 
Gmail SMTP (authenticated), Webfaction SMTP (authenticated), and
localhost (normal).
 
Example:
 
Sends the contents of http://www.espn.com to friend@domain.com using your Gmail settings
 
    python web2email.py -u gmail_username \
                        -p gmail_password \
                        -f gmail_username@gmail.com \
                        -r friend@domain.com http://www.espn.com
 
Sends the contents of http://www.espn.com to friend@domain.com using your Webfaction settings
 
    python web2email.py -u webfaction_username \
                        -p webfaction_password \
                        -f webfaction_account@webfaction_domain.com \
                        -s smtp.webfaction.com \
                        -r friend@domain.com http://www.espn.com
 
Sends the contents of http://www.espn.com to friend@domain.com using your local settings
 
    python web2email.py -f your_email@domain.com \
                        -s localhost \
                        --port 25 \
                        -r friend@domain.com http://www.espn.com
"""
 
__author__  = "ryankanno@localkinegrinds.com"
__url__     = "http://blog.localkinegrinds.com"
__version__ = "0.1"
 
USAGE = "usage: %prog [options] url" 
DESC  = __doc__.split('\n\n')[0]
 
def configure_logging(log_level, format='%(asctime)s %(levelname)s %(message)s'):
    logging.basicConfig(level=log_level, format=format)
 
def _validate_options_and_args(parser, options, args):
    logging.debug("Validating options and arguments.")
    if (len(args) != 1):
        parser.error("Incorrect number of arguments.  Script expects 1 (URL to backup), but received %i." % len(args))
        sys.exit(2) # Command line syntax error
    elif not options.recipients: 
        parser.error("You must include at least one recipient.")
        sys.exit(1) 
    elif (options.username and options.password is None) or (options.username is None and options.password is not None):
        parser.error("You must include both a username and password.")
        sys.exit(1) 
    elif not options.from_email:
        parser.error("You must include a valid from email address.")
        sys.exit(1) 
 
def getPage(url):
    logging.debug("Attempting to retrieve %s" % url)
    try:
        response = urllib2.urlopen(url)
        return response.read()
    except urllib2.HTTPError, e:
        logging.error("HTTPError (%s) occurred retrieving %s" % (e.code, url))
        sys.exit(1)
    except urllib2.URLError, e:
        logging.error("URLError (%s) occurred retrieving %s" % (e.reason, url))
        sys.exit(1)
 
def mail(send_from, send_to, subject, text, content_type, files=[], server='localhost', port=25, username=None, password=None):
 
    def _auth(server, port, username, password):
        logging.debug("Attempting to send email via %s:%i using the following credentials (%s:%s)." % (server, port, username, password))
        smtp = smtplib.SMTP(server, port) 
        smtp.ehlo()
        smtp.starttls()
        smtp.ehlo()
        smtp.login(username, password)
        smtp.sendmail(username, send_to, msg.as_string())
        smtp.close()
 
    def _unauth(server, port):
        logging.debug("Attempting to send email via %s:%i" % (server, port))
        smtp = smtplib.SMTP(server, port)
        smtp.sendmail(send_from, send_to, msg.as_string())
        smtp.close()
 
    assert type(send_to)==list
 
    msg=MIMEMultipart()
    msg['From'] = send_from
    msg['To'] = COMMASPACE.join(send_to)
    msg['Date'] = formatdate(localtime=True)
    msg['Subject'] = subject
 
    text = MIMEText(text)
    text.set_type(content_type)
    text.set_param('charset', 'UTF-8')
 
    msg.attach(text)
 
    for f in files:
        part = MIMEBase('application', "octet-stream")
        part.set_payload(open(file,"rb").read())
        Encoders.encode_base64(part)
        part.add_header('Content-Disposition', 'attachment; filename="%s"' % os.path.basename(f))
        msg.attach(part)
 
    if not username and not password:
        _unauth(server, port)
    else:
        _auth(server, port, username, password) 
 
def main():
    parser = OptionParser(usage=USAGE, description=DESC)
 
    parser.add_option("-u", "--username", dest="username", metavar="USER", help="Username to SMTP server")
    parser.add_option("-p", "--password", dest="password", metavar="PWD", help="Password to SMTP server")
    parser.add_option("-s", "--server", dest="server", metavar="SERVER", help="SMTP server (Defaults to Gmail)")
    parser.add_option("--port", dest="port", metavar="PORT", type="int", help="SMTP server port (Defaults to Gmail)")
    parser.add_option("-f", "--from", dest="from_email", metavar="FROM", help="From address")
    parser.add_option("-r", "--recipient", action="append", dest="recipients", metavar="RCPT", type="string", help="Email recipient")
    parser.add_option('-t', '--test', action="store_true", dest="test", metavar="TEST", help="Run tests")
    parser.add_option('-v', '--verbose', action='store_const', dest='log_level', const=logging.DEBUG, help='Verbose output')
    parser.set_defaults(server="smtp.gmail.com", port=587, test=False, log_level=logging.INFO)
    (options, args) = parser.parse_args()
 
    _validate_options_and_args(parser, options, args)
    configure_logging(options.log_level)
 
    if options.test:
        _test() # Too lazy to write a test for this script.  @TODO - use mocks 
 
    # Retrieve URL and return html
    html = getPage(args[0])
 
    # Send mail with returned html as body 
    mail(options.from_email, options.recipients, 
         '%s @ %s' % (args[0], (datetime.datetime.now().strftime("%A %B %d %I:%M:%S %p %Y"))), 
         html, 'text/html', 
         server=options.server, port=options.port, username=options.username, password=options.password)
 
    # Return with appropriate exit code
    sys.exit(0)
 
def _test():
    import doctest
    doctest.testmod(sys.modules[__name__])
 
if __name__ == '__main__':
    main()

All right stop, cron time! (imagine a 90’s pop song)

As an added bonus, you can install this script to run via cron so you’ll magically end up with webpages archived in your inbox! Neat. You can read my previous post on cron, or you can create the following crontab.

MAILTO=ryankanno@CHANGE_TO_YOUR_EMAIL.com
# minute (0-59),
# |      hour (0-23),
# |      |       day of the month (1-31),
# |      |       |       month of the year (1-12),
# |      |       |       |       day of the week (0-6 with 0=Sunday).
# |      |       |       |       |       commands
  0      0       *       *       *      /usr/bin/python2.5 /PATH/TO/web2email.py -u GMAIL_USER -p GMAIL_PWD -f FROM_USER -r RECIPIENT URL

As a side note, don’t forget double quotes around URL if there’s spaces!

Notice, change the value of ryankanno@CHANGE_TO_YOUR_EMAIL.com to your email address (or comment the line out with a # if you don’t want emails sent to you), GMAIL_USER to your Google username, GMAIL_PWD to your Google password, FROM_USER to the from address in the mail header, RECIPIENT to the recipient email address, and URL to the URL you want backed up.

I know, I know. The critics.

The critics will say that your Gmail username and password are in cleartext. I know. They are. So… I’m hoping that since you just need an archive of a publicly available URL on the Internets, the data doesn’t need to be super-duper-Fort-Knox-protected. If it does, this script isn’t for you. :( Oh, yeah, before I forget… here’s a hint… *cough*create another Google account*cough*. With that said, archive to your heart’s content!

Enjoy!

Popularity: 33% [?]

Tagged: , , , , , .


Backing up your Subversion (SVN) repository on Dreamhost with cron

Two events spurred me to write this blog.

First, my 2 year old “Subversion + Dreamhost + Post-Commit” blog still gets quite a number of hits. Second, after the latest Dreamhost outage move, I’m beginning to feel a little more vigilant about backing up my data.

As a standard disclaimer, if you’re not familiar with the Unix shell, I highly suggest you not try this unless under the supervision of someone who reads Perl books for fun. By accessing your Dreamhost shell, you can seriously f-up your account and I will not fix it for you. You have been warned. :) (Don’t you just love smileys?)

Setup

There are a few prerequisites to being able to back up your SVN repository.

  1. First and foremost, you must have already installed a SVN repository into your Dreamhost account via the control panel.
  2. Second, you must know how to SSH into your Dreamhost account. As a FYI, you sorta-kinda-need to know what that means in order to follow this tutorial.

Grabbing the backup script

Wait, you didn’t think I was writing my own right? In any case, if you actually installed/compiled Subversion on your own, it would’ve contained this file, hotbackup.py. Fortunately for us, Dreamhost has this file conveniently available at: /usr/bin/svn-hot-backup, but it’s an older version of the backup script. There are some subtle differences like being unable to pass in the number of backups you want the script to manage. Personally, I like to be on the edge, so let’s get the latest version. Execute the following commands from your home directory.

$ cd ~
$ mkdir scripts
$ cd scripts
$ wget http://svn.collab.net/repos/svn/trunk/tools/backup/hot-backup.py.in
$ mv hot-backup.py.in svn-hot-backup.py

The commands issued above created a directory called scripts in your home directory, switched into the directory, downloaded the latest hot-backup.py file from CollabNet, and renamed it to svn-hot-backup.py. Now that you have the file, you’ll need to make a few edits. Personally, I’m accustomed to vi, but pick your poison (pico, nano, text editor of your choice) and find these two values (they should be close to the top of the file in consecutive lines).

# Path to svnlook utility
svnlook = r"@SVN_BINDIR@/svnlook"

# Path to svnadmin utility
svnadmin = r"@SVN_BINDIR@/svnadmin"

and change them to the following:

# Path to svnlook utility
svnlook = r"/usr/bin/svnlook"

# Path to svnadmin utility
svnadmin = r"/usr/bin/svnadmin"

(If you’re wondering, if and when you compile/install Subversion yourself, these two variables would have been automagically filled in for you.)

The python script we downloaded not only performs a hotcopy of your svn directory, but also can archive it and manage a set number of copies. Pretty neat right?

Preparing for the backups

Before you can actually back up your SVN repository, you’ll want to create a directory structure to manage your backups. Execute the following commands from your home directory.

$ cd ~
$ mkdir backup
$ cd backup
$ mkdir svn
$ cd ~/scripts

The commands issued above created a directory called backup in your home directory, switched into the directory, and created another directory called svn within the backup directory. We’ll be using this directory to store all your backups. Finally, we switched back into the scripts directory created in the previous steps. Now that we have the backup script and directory structure to manage the back ups, let’s test it out!

Before you can back up your repository, you’ll have to know the name of the Subversion repository you’re trying to back up. To find the name of your repository, you can either look in the svn directory in your home directory, or you can check out the ID value in your Subversion Goodies control panel. In any case, remember the name of your SVN repository and issue the following commands.

$ cd ~/scripts/
$ python2.4 svn-hot-backup.py --archive-type=zip --num-backups=10 ~/svn/REPOSITORY_NAME_HERE/ ~/backup/svn/
Notice, change the value of REPOSITORY_NAME_HERE to the id of the SVN repository you want backed up.

You should see the following in the console:

Beginning hot backup of '/home/USERNAME/svn/lkg/'.
Youngest revision is REVISION_NUMBER
Backing up repository to '/home/USERNAME/backup/svn/REPOSITORY_NAME_HERE-701'...
Done.
Archiving backup to '/home/USERNAME/backup/svn/REPOSITORY_NAME_HERE-701.zip'...
Archive created, removing backup '/home/USERNAME/backup/svn/REPOSITORY_NAME_HERE-701'...
If you see the following, the backup was a success! You can even check on the file by changing into the backup/svn directory!

Voila! (But there’s more)

Automating the backups

Now that you actually have the script backing up your SVN repository, let’s automate them! To do so, we’ll use the handy cron daemon. Cron has similarities to the Windows task scheduler in that it provides a service that enables a user to execute commands at a specified date/time or set intervals. To tell cron the tasks you want to execute, you’ll need to load a configuration file called a crontab. You can read more about it here and here. In any case, here’s what my crontab configuration file looks like.

MAILTO=ryankanno@CHANGE_TO_YOUR_EMAIL.com
# minute (0-59),
# |      hour (0-23),
# |      |       day of the month (1-31),
# |      |       |       month of the year (1-12),
# |      |       |       |       day of the week (0-6 with 0=Sunday).
# |      |       |       |       |       commands
  0      0       *       *       *      /usr/bin/python2.4 /home/USERNAME/scripts/svn-hot-backup.py --archive-type=zip --num-backups=10 /home/USERNAME/svn/REPOSITORY_NAME/ /home/USERNAME/backup/svn/

Create a file in your scripts directory called svn_backup_once_a_day.cron and copy the contents above into your file. I’ve setup my crontab to backup my svn repository once a day.

Notice, change the value of ryankanno@CHANGE_TO_YOUR_EMAIL.com to your email address (or comment the line out with a # if you don’t want emails sent to you), USERNAME to your Dreamhost username, and REPOSITORY_NAME to your Subversion repository.

Once you have this file called svn_backup_once_a_day.cron in your scripts directory, load the file into your crontab by issuing the following command:

$ crontab svn_backup_once_a_day.cron

As a FYI, this will replace your old crontab. If you have other items already running on cron, it’s a good idea to list them via the crontab -l command first. If you want to make sure that your cron will run, you can test it out by setting the values in the crontab to the time you want it to run. I’ll leave this as an exercise to the reader. :)

Storing your backups

Though out of scope of this blog, you’ll still have to store your backups somewhere. Please just don’t leave them in your Dreamhost account. Your best bet is probably to get an Amazon S3 account and store your backups there. Personally, I like to run another script immediately after the hotcopy finishes that pushes the backup to my S3 account. Other options include scp/sftp’ing the backups to your home machine. Here’s a link to read more about that option.

Voila! Enjoy!

Popularity: 43% [?]

Tagged: , , , , , , , , .


Backing up your Wordpress installation.

Wordpress

I’ve found that many bloggers with Wordpress installations seem to overlook the simple task of backing up their data. Rather, they put their blogging fate into the hands of their trustworthy web-hosting provider. Not that I don’t trust my provider, but I like to sleep knowing full well that my data is resting in a few safe places – rather than putting my eggs all in one basket. So you can either learn the hard way like here, here, and here, or you can follow the simple backup recipe below.

Backing up your Wordpress blog calls for the installation of two very simple Wordpress plugins, WP-DB Backup and WP-Cron, and a dummy Gmail account. After going through the painstaking task of downloading, installing, and configuring each plugin on my hosting provider, I come to find out that other users have reportedthat these plugins have stopped working for Wordpress 2.0.4, the exact version I have installed on my hosting provider. After searching around a bit (how did people live without Googs?), I found this blog posting that outlined the exact versions of the plugins you’ll need to get backups working with Wordpress 2.0.4. In any case, I’ll walk you through the procedure I followed to get my Wordpress installation backed up.

The setup

If you’ve somehow managed to SSH/telnet into your hosting account, you can issue the wget command to retrieve the two Wordpress plugins. I highly suggest creating a temp directory, then cd’ing into the directory to execute these commands.

wget URL/TO/LATEST/WP-DB-BACKUP.zip
wget URL/TO/LATEST/WP-CRON.zip

Note: I didn’t include the URL’s since they are likely to change depending on when this post is read.

The install

After downloading these two files, you’ll need to move them into your Wordpress plugins directory using the following command:

mv *.zip WP_INSTALL_DIRECTORY/wp-content/plugins

Cd into the directory and extract the two zip files using the following commands:

unzip LATEST_WP-CRON.zip
unzip LATEST_WP-DB-BACKUP.zip

Once these files are unzipped, click here to find the files you need to overwrite. This can be done by cutting and pasting the file contents from the blog over to the same files on your hosted account. You can even delete the files on your hosted site and create a new file with the contents from the aforementioned blog.

You pick your poison.

Once installed, login to your Wordpress admin site and find the Plugins link atop the administration panel. You’ll have to ‘activate‘ your plugins via the administration panel. Once you’ve activated both WP-Cron and WP-DB Backup, click on the Manage link atop the administration panel. In the sub panel, you’ll find a new link that reads ‘Backup’.

The config

I highly, highly recommend that before setting up your backup to execute nightly, immediately test to see if you can backup your current Wordpress installation. This can be done by selecting the ‘Email backup to:’ option (fill in an email address) and pressing the Backup button. Once you know that the backups are working (check your email address)… fill in the Scheduled backup section by selecting Daily and filling in an email address. Make sure to add any extra tables (if there are any) that you would like to backup.

You’re probably wondering what the Gmail account was for. Since Google mail offers a sweet 2 GBs of free storage, create an appropriately named dummy Gmail account such as ‘my-blog-backup@gmail.com’ and send all your backups to this address. Periodically, download the backups to your home machine by accessing Gmail via POP, but also leave a few backups sitting in the actual account. If you ever approach the 2 GB storage limit, log into your account via the web and delete the extraneous backups.

Voila! Blog on!

Now bloggers can rest assured that Googs, your hosting provider, and your local machine will have a copy of your current blog.

Popularity: 11% [?]

Tagged: , , , , , , , , , , .


Powered by Wordpress. Stalk me.