Python backup script revisited

Here I wrote about my initial attempts with Python. I started with a backup script for my home-computer because that’s what I needed at the time. Now I’m reviewing and tweaking it.

My considerations at the time of writing the initial script were:

It should check if the directories actually exist.
It should ask for confirmation before doing anything
I needed the script to backup the specified folder to the specified location.
It should delete unnecessary files before backing up
Considering I used rsync in my bash-script I wanted the same functionality again

They are still the same for the new script, but I also had some improvements I wanted to make:

Logging
Automation
Exclusions

Adding Automation

I don’t want my script to be interactive anymore. Instead I want to give all information the script needs with variables before it runs, e.g.: ./home-backup.py —destination —source —delete-stuff

To do this, I use argparse, a library that let’s you give arguments to your script. I will try to explain the steps I did, but it’d be good to be comfortable with argparse.

First, import argparse and assign it to a variable.

import argparse

parser = argparse.ArgumentParser(
    description=__doc__)

When you assign the “__doc__” to the description-variable, the description you gave for the python-script will be shown, when you let yourself show the usage of the script:

$  python2.7 home_backup.py -h
usage: home_backup.py [-h] [-l LOGFILE] [-d] [-t] [-e EXCLUDE]
                  BACKUPDIR DESTINATIONDIR

This script backups everything from BACKUPDIR to DESTINATIONDIR using rsync.

Next, we have to define arguments that our little script will accept from the input.

The add_argument-object takes at least one variable, the name of the argument. To define a mandatory argument, do not write a dash before the variable.

parser.add_argument("BACKUPDIR")

To make it optional, add a dash.

parser.add_argument("-l")

If you also want to use the long form of an argument, add two dashes and the long word as a second variable, delimited with a comma.

parser.add_argument("-l", "--logfile")

There’s also the “help”-variable, where you define the description of the variable that is shown, when the usage of your script is described.

parser.add_argument("BACKUPDIR", help="Specify the directory to backup.")

You can also store Boolean-variables with arguments. That means, if the argument is defined, then it is set to TRUE. You achieve this with the following option:

parser.add_argument("-d", action="store_true")

Here are all arguments together.

parser.add_argument("BACKUPDIR", help="Specify the directory to backup.")
parser.add_argument("DESTINATIONDIR", help="Specify the directory where the backup is stored.")
parser.add_argument("-t", "--trash", help="Delete unnecessary files and empty the trash.", action="store_true")
parser.add_argument("-e", "--exclude", help="Exlude the following directories from backup.", action="append")
parser.add_argument("-l", "--logfile", help="Specify the logfile to monitor.")
parser.add_argument("-q", "--quiet", help="Do not print to stdout.", action="store_true")

And finally, activate the function:

args = parser.parse_args()

Defining Paths

The old script had two non-interactive parts in the script: - Confirm on actions - Defining backup paths

They’re gone, because argparse does the job now. If you want, you can read about it in the old blog post.

Now, all we have to do is define the path to backup and the destination path via command line.

parser.add_argument("BACKUPDIR", help="Specify the directory to backup.")
parser.add_argument("DESTINATIONDIR", help="Specify the directory where the backup is stored.")

I assigned two variables to these arguments to make working with them later easier:

backupdir = args.BACKUPDIR
destinationdir = args.DESTINATIONDIR

Logging

I need some simple logging-functionality in case anything goes wrong or I want to know, what got deleted or saved. Therefore I used the built-in logging library of Python. First, I define the format of the log-output. “asctime” is the default date format, “levelname” is the log-level (INFO, DEBUG, ERROR,…) and “message” is the message to be logged.

import logging
logFormatter = logging.Formatter("%(asctime)s %(levelname)s  %(message)s")

I assign the configuration to the variable “logFormatter”. This way I can use the configuration with different logHandlers. Handler define where the log-messages are send to, e.g. a log-file or the command-line. That is exactly what I want. So I have to have to handlers:

rootLogger = logging.getLogger()
logfile = args.logfile

if logfile:
    fileHandler = logging.FileHandler(logfile)
    fileHandler.setFormatter(logFormatter)
    rootLogger.addHandler(fileHandler)

The first line initializes the logger. The following configuration will only be used when the variable “logfile” is used. The third line tells the file handler where to log to. The fourth line sets the format of the logging to “logFormatter” which I defined earlier. The fifth line adds the file handler to the root logger.

if not args.quiet:
   consoleHandler = logging.StreamHandler()
   consoleHandler.setFormatter(logFormatter)
   rootLogger.addHandler(consoleHandler)

The first if-statement looks for the argument “quiet”. If it is False, add a console handler. These three lines do the same as in the logfile-handler, except the output goes to the command-line instead of the log-file.

Check if directory exists

Python actually already has a function that checks if a directory exists: os.path.exists

So all I have to do is to use the built-in function and put it in my own little function, that performs the checks and prints an error-message and exits, if the directory does not exist.

# directory exist-check
def check_dir_exist(os_dir):
    if not os.path.exists(os_dir):
        logging.error("{} does not exist.".format(os_dir))
        exit(1)

Delete files

I want to delete unnecessary files like temporary or backup files before doing the backup. For this I created two functions. The first functions takes an argument (in this case: file endings) as input and searches in the backup path for files ending with the argument. It then deletes these files. If it cannot delete the file, a warning message is printed but the script execution is continued. All this should only happen if I want it to. So I’m adding the argument “—trash” with the action “store_true”. This means, that if I run the script with the trash-argument, the variable “args.trash” will be a boolean variable, so I can later check it and only delete files if “trash” is True.

 # delete function
def delete_files(ending, indirectory):
    for r, d, f in os.walk(indirectory):
        for files in f:
            if files.endswith("." + ending):
                try:
                    os.remove(os.path.join(r, files))
                    logging.info("Deleting {}/{}".format(r, files))
                except OSError:
                    logging.warning("Could not delete {}/{}".format(r, files))
                    pass

I use this function in a for-loop that iterates trough three file endings, but only if the argument “trash” is provided via command-line.

if args.trash:
    file_types = ["tmp", "bak", "dmp"]
    for file_type in file_types:
        delete_files(file_type, backupdir)

At last, the trash can of the user executing the script gets deleted. I use shutil.rmtree for this. It deletes the whole file-directory. Don’t worry: it’s recreated when a file is moved to the trash. os.path.expanduser just expands the “~” to the user’s home directory.

# Delete actual files first
if args.trash:
    file_types = ["tmp", "bak", "dmp"]
    for file_type in file_types:
        delete_files(file_type, backupdir)
    # Empty trash can
    try:
        rmtree(os.path.expanduser("~/.local/share/Trash/files"))
    except OSError:
        logging.warning("Could not empty the trash or trash already empty.")
            pass

Again, if it cannot empty the trash, a warning message is printed but the script execution is continued.

Excluding files

I want to exclude certain files and directories from backup. Rsync has an exclusion-function and all I needed to do is to pass the files and directories to this option. Again I used the argparse-arguments to add excludes. The action for this argument is “append” so if I specify multiple exclusions, they are appended to a list. I then read this list and append all exclusions to a variable. This variable is then called in the rsync-backup.

# handle exclusions
exclusions = []
if args.exclude:
    for argument in args.exclude:
        exclusions.append("--exclude={}".format(argument))

Backing up the files

This is the important part of the script. I tried to find an rsync alternative in Python, I searched for a method to mimic the rsync behavior but I didn’t find anything that was easy to understand, capable of what it should do and not be outdated. ~~That is, until I found Rsyncbackup a Python script “to perform automatic backups using the rsync command”. It does exactly what it says it does, so I went on and used it in my script.~~

So I used rsync itself in combination with sh. Sh let’s me execute any program as if it were python native.

Here, I differencated fice possibilities: - Use logging, exclusions but be quiet on the commandline - Use logging and exclusions - Use exclusions and be quiet on the commandline - Use logging but be quiet on the commandline - Use nothing of the above - That is covered by several if-statement that check, if the variables are empty or there. That’s quite bulky but currently I don’t know any other way. If you know one, write it in the comments!

# Do the actual backup
logging.info("Starting rsync.")
if logfile and exclusions and args.quiet:
    rsync("-auhv", exclusions, "--log-file={}".format(logfile), backupdir, destinationdir)
elif logfile and exclusions:
    print(rsync("-auhv", exclusions, "--log-file={}".format(logfile), backupdir, destinationdir))
elif args.quiet and exclusions:
    rsync("-av", exclusions, backupdir, destinationdir)
elif logfile and args.quiet:
    rsync("-av", "--log-file={}".format(logfile), backupdir, destinationdir)
else:
    rsync("-av", backupdir, destinationdir)

The result

That’s the whole script put together:

#!/usr/bin/env python

import os
from shutil import rmtree
import argparse
import logging
from sh import rsync


#Parse arguments
parser = argparse.ArgumentParser(
    description=__doc__)

parser.add_argument("BACKUPDIR", help="Specify the directory to backup.")
parser.add_argument("DESTINATIONDIR", help="Specify the directory where the backup is stored.")
parser.add_argument("-t", "--trash", help="Delete unnecessary files and empty the trash.", action="store_true")
parser.add_argument("-e", "--exclude", help="Exlude the following directories from backup.", action="append")
parser.add_argument("-l", "--logfile", help="Specify the logfile to monitor.")
parser.add_argument("-q", "--quiet", help="Do not print to stdout.", action="store_true")

args = parser.parse_args()

# Define variables
backupdir = args.BACKUPDIR
destinationdir = args.DESTINATIONDIR
logfile = args.logfile

#Logging
rootLogger = logging.getLogger()
logFormatter = logging.Formatter("%(asctime)s - %(message)s")
rootLogger.setLevel(logging.INFO)
if logfile:
    fileHandler = logging.FileHandler(logfile)
    fileHandler.setFormatter(logFormatter)
    rootLogger.addHandler(fileHandler)

if not args.quiet:
    consoleHandler = logging.StreamHandler()
    consoleHandler.setFormatter(logFormatter)
    rootLogger.addHandler(consoleHandler)


# directory exist-check
def check_dir_exist(os_dir):
    if not os.path.exists(os_dir):
        logging.error("{} does not exist.".format(os_dir))
        exit(1)

check_dir_exist(backupdir)

# delete function
def delete_files(ending, indirectory):
    for r, d, f in os.walk(indirectory):
        for files in f:
            if files.endswith("." + ending):
                try:
                    os.remove(os.path.join(r, files))
                    logging.info("Deleting {}/{}".format(r, files))
                except OSError:
                    logging.warning("Could not delete {}/{}".format(r, files))
                    pass


# Delete actual files first
if args.trash:
    file_types = ["tmp", "bak", "dmp"]
    for file_type in file_types:
        delete_files(file_type, backupdir)
    # Empty trash can
    try:
        rmtree(os.path.expanduser("~/.local/share/Trash/files"))
    except OSError:
        logging.warning("Could not empty the trash or trash already empty.")
        pass

# handle exclusions
exclusions = []
if args.exclude:
    for argument in args.exclude:
        exclusions.append("--exclude={}".format(argument))


# Do the actual backup
logging.info("Starting rsync.")
if logfile and exclusions and args.quiet:
    rsync("-auhv", exclusions, "--log-file={}".format(logfile), backupdir, destinationdir)
elif logfile and exclusions:
    print(rsync("-auhv", exclusions, "--log-file={}".format(logfile), backupdir, destinationdir))
elif args.quiet and exclusions:
    rsync("-av", exclusions, backupdir, destinationdir)
elif logfile and args.quiet:
    rsync("-av", "--log-file={}".format(logfile), backupdir, destinationdir)
else:
    rsync("-av", backupdir, destinationdir)

logging.info("done.")

Also on GitHub.