Automatically Backup any Postgres Database Table into a GoogleDrive

Tags: , , ,

Categories: ,

Updated:

6 minute read

Motivation and use case

In this post I will explain how to automate the process of backuping a postgres database table into a GoogleDrive cloud storage location. I encoutered this problem when helping my dad out with some IT related problem, which included the backup process of a Postgress SQL database table. The solution I provide here can, however, easily be adapted to backup whatever and upload it into some GoogleDrive folder.

In my approach, we use two scripts to accomplish this: One script to produce the backup files and a second script that takes care of the uploading process. If you wish to upload something else, simply adapt the shell script which, in this case, generate the .sql dumps of the desired postgres database table.

For now, clone the repository and we are good to go:

git clone https://github.com/frietz58/postgres_googledrive_backup.git

If just want to get started, the README of this project on Github contains all the relevant steps aswell ;)

Note that the installation process requires a desktop environment and a browser, so straight up installing this on a headless linux distro won’t work, for reasons explained below .

Dumping a postgres table

First, we will take a look at how we can backup a postgres database table (If you wish to backup something else, you should start here). For this, we use the command pg_dump [dbname], which can create script or archive dumps of any given database. I’ve chosen to use script dumps which are

"plain-text files containing the SQL commands required to reconstruct the database to the state it was in at the time it was saved." postgres documentation

Specifically, in the cron_backup.sh script, the line that produces the .sql dumb file is:

  pg_dump $target_db > $save_dir/$timestamp"_dump_"$ip4_addr".sql"

This create the .sql dump of $target_db in the folder $save_dir, where the name of the dump file contains the current timestamp $timestamp and the IPv4 adress $ip4_addr of the machine, on which the backup has been created. Regarding the postgres database, you need to consider that by default, not everyone user is allowed to access the database tables. If you wish to only create a backup once, right now, you could use the following command:

sudo -i -u postgres pg_dump $target_db > dump.sql

Here, -i runs the command as an login shell and -u provides the target user. When we automate this process, we use the crontab of the postgres user to take care of this, as explained below.

The rest of the cron_backup.sh script parses the .yaml configuration file, sets varable like $ip4_addr or $timestamp and echo some log messages. If you don’t neet dynamic file names and can assure that the $save_dir always exists, you could drastically shorten the script :)

Uploading to GoogleDrive

The second scripts drive_upload.py takes care of uploading the files into a GoogleDrive cloud folder. It implements the GoogleDrive v3 API. Before that works, you will have to install the requirements, preferably in a virtualenv. So create and activate the virtualenv:

virtualenv -p python3 postgres_venv
source postgres_venv/bin/activate
pip install -r postgres_googledrive_backup/requirements.txt

Once you’ve installed the requirements, you need to enable the drive api for you Google account. I recommend that you create a new Google account specifically for storing the backups. Which is what we will do in the next section.

Enabling the Drive v3 API

Before we can automate the upload process, we must enable the Google Drive v3 API and in the process aquire the two files credentials.json and token.pickle. Those files are needed by the drive_upload.py script to authenticate the connected Google account using the OAuth v2.

Aquiring credentials.json file

Enabling the drive api (under step one) on this page, will download the file credentials.json. For now you can save this file wherever, but make sure to set the path in the config.yaml at credtials_path: /path/to/credentials.json yaml, so that the drive_upload.py script will know where the file is located. Once that is done, we can finally obtain the token.pickle file, which is the last piece that is still missing.

Acquiring token.pickle file

Even though I am not very happy about, the last step required you to have a desktop environment with a working browser. I am not sure why Google made it this way, but I assume that they want that you can only directly, in an interactive manner give a script or app access to your drive storage.

Given that you have set the path to the credentials.yaml file in the config.yaml file, execute the drive_upload.py script:

cd postgres_googledrive_backup
python drive_upload.py -cf /path/to/config.yaml

This will open your default browser and ask you to allow “Quickstart” access to that Google accounts drive storage:

Google OAuthv2 screen OAuthv2 Confirmation message

If everything worked, this will create the file token.pickle in your current working directory and output Token file has been acquired, exiting... to your console. Make sure to adjust to the path of the token.pickle file in the config.yaml file, so that drive_upload.py finds the token, independt of your working directory.

And that’s it. You could now manually run both scripts manually and it would upload the .sql dumbs into the google drive folder set in the config.file. But we are developers, there is no fun in doing things manually. So let’s automate the entire workflow using the crontab in the next and final section of this post.

Automatic execution via Crontab

Now that the files credentials.json and token.pickle have been acquired, we can also use this on any headles linux distro. Simply scp the entire postgres_googledrive_backup folder to your headless linux distro, create a virtualenv there and install the requirements once again. The important thing is to make sure that the paths in the config.yaml file are correct.

To automatically create backups using the cron_backup.sh script and automatically uploading those into the cloud with the drive_upload.py script, we will use the crontab. The crontab allows us to execute arbitrary commands on a regular schedule, meaning that we can automatically create backups and upload them into the cloud.

Because only the postgres user has access to the postgres database tables that we wish to backup, we must use the postgres users crontab to schedule the backups. I assume that you have postgres installed, given that this post deals with backuping a postgres database table… We can use the su command to become the postgres user like this:

sudo su postgres

Once we are the postgres user, we can edit the crontab using the following command:

crontab -e

50 23 * * * /home/pcadmin/automatic_backup/cron_backup.sh /home/pcadmin/automatic_backup/config.yaml > /tmp/backup.log
55 23 * * * /home/pcadmin/backup_venv/bin/python3 /home/pcadmin/automatic_backup/drive_upload.py -cf /home/pcadmin/automatic_backup/config.yaml >> /tmp/backup.log

If you wish to run the backup process daily at a given specific time, the cron prefix is 23 50 * * * for every day at 23:50. Take a look at the crontab.md file, which is pretty close to what I am running on my linux server. Both scripts except one paramaeter, which is the path to the config.yaml file. So make sure to adjust that path and also make sure that the paths in the config.yaml are also correct.

When working the crontab, environment variables are not set, meaning that something python example.py won’t work, instead you need to use absolute paths: /usr/bin/python example.py.

Further, we need to make sure that postgres user, who will execute the commands in the postgres crontab, has read, write and execute permissions at the directory containing our scripts, so run the following commands:

sudo chown postgres postgres_googledrive_backup
sudo chmod 700 postgres_googledrive_backup

This makes the postgres user the owner of that directory and gives only the owner read write and execute permisions.

And that’s it. You can now, at an arbitrary interval, create backups of a postgres database table (or anything else if you adapt the backup generating script) and upload those backups into a GoogleDrive cloud folder. Working with the crontab for the first time can be a bit cumbersome, because commands that work in your shell don’t necisarly work in the crontab. Make sure that all paths are correct and the user whose crontab you are using (here the postgres user) has sufficient permissions on the folder its operating on, and you should be good to go.

Cheers,
Finn.