Backing up Docker volume data to Digital Ocean spaces with encryption

Backups are a must for pretty much anything digital. And automating those backups make life so much easier for you, should you lose your data.

My use case

My own use case is to backup the data on my home server, since these are storing my music collection and my family’s photos and documents.

All of the services on my home server are installed with Docker, with all of the data in separate Docker Volumes. This means I should only need to back those folders that get mounted into the containers, since the services themselves could be easily re-deployed.

I also want this data to be encrypted, since I will be keeping both an offline local copy, as well as storing a copy in a third party cloud provider (Digital Ocean spaces).

Setting up s3cmd

S3cmd is a command line utility for interacting with an S3-compliant storage system.

It will enable me to send a copy of my data to my Digital Ocean Spaces account, encrypting it before hand.

Install s3cmd

The official installation instructions for s3cmd can be found on the Github repository.

For Arch Linux I used:

sudo pacman -S s3cmd

And for my home server, which is running Ubuntu Server, I installed it via Python’s package manager, “pip”:

sudo pip install s3cmd

Configuring s3cmd

Once installed, the first step is to run through the configuration steps with this command:

s3cmd --configure

Then answer the questions that is asks you.

You’ll need these items to complete the steps:

  • Access Key (for digital ocean api)
  • Secret Key (for digital ocean api)
  • S3 endpoint (e.g. lon1.digitaloceanspaces.com)
  • DNS-style (I use %(bucket)s.ams3.digitaloceanspaces.com)
  • Encryption password (remember this as you’ll need it for whenever you need to decrypt your data)

The other options should be fine as their default values.

Your configuration will be stored as a plain text file at ~/.s3cmd. This includes that encryption password.

Automation script for backing up docker volume data

Since all of the data I actually care about on my server will be in directories that get mounted into docker containers, I only need to compress and encrypt those directories for backing up.

If ever I need to re-install my server I can just start all of the fresh docker containers, then move my latest backups to the correct path on the new server.

Here is my bash script that will archive, compress and push my data to backup over to Digital Ocean spaces (encrypting it via GPG before sending it).

I have added comments above each section to try and make it more clear as to what each step is doing:

#!/usr/bin/bash

## Root directory where all my backups are kept.
basepath="/home/david/backups"

## Variables for use below.
appname="nextcloud"
volume_from="nextcloud-aio-nextcloud"
container_path="/mnt/ncdata"

## Ensure the backup folder for the service exists.
mkdir -p "$basepath"/"$appname"

## Get current timestamp for backup naming.
datetime=$(date +"%Y-%m-%d-%H-%M-%S")

## Start a new ubuntu container, mounting all the volumes from my nextcloud container 
## (I use Nextcloud All in One, so my Nextcloud service is called "nextcloud-aio-nextcloud")
## Also mount the local "$basepath"/"$appname" to the ubuntu container's "/backups" path.
## Once the ubuntu container starts it will run the tar command, creating the tar archive from 
## the contents of the "$container_path", which is from the Nextcloud volume I mounted with 
## the --volumes-from flag.
docker run \
--rm \ 
--volumes-from "$volume_from" \
-v "$basepath"/"$appname":/backups \
ubuntu \
tar cvzf /backups/"$appname"-data-"$datetime".tar.gz "$container_path"

## Now I use the s3cmd command to move that newly-created 
## backup tar archive to my Digital Ocean spaces.
s3cmd -e put \
  "$basepath"/"$appname"/"$appname"-data-"$datetime".tar.gz \
  s3://scottie/"$appname"/

Automating the backup with a cronjob

Cron jobs are a way to automate any tasks you want to on a Linux system.

You can have fine-grained control over how often you want to run a task.

Although work with Linux’s cron scheduler is out of the context of this guide, I will share the setting I have for my Nextcloud backup, and a brief explanation of its configuration.

The command to edit what cron jobs are running on a Linux system, Ubuntu in my case, is:

crontab -e

This will open up a temporary file to edit, which will get written to the actual cron file when saved — provided it is syntactically correct.

This is the setting I have in mine for my Nextcloud backup (it should all be on a single line):

10 3 * * 1,4 /home/david/backup-nextcloud >> /home/david/backups/backup-nextcloud.log

The numbers and asterisks are telling cron when the given command should run:

10th minute
3rd Hour
* Day of month (not relevant here)
* Month (not relevant here)
1st,4th Day of the Week (Monday and Thursday)

So my configuration there says it will run the /home/david/backup-nextcloud command every Monday and Thursday at 3:10am. It will then pipe the command’s output into my log file for my Nextcloud backups.

Decrypting your backups

Download the file from your Digital Ocean spaces account.

Go into the directory it is downloaded to and run the file command on the archive:

# For example
file nextcloud-data-2023-11-17-03-10-01.tar.gz

# You should get something like the following feedback:
nextcloud-data-2023-11-17-03-10-01.tar.gz: GPG symmetrically encrypted data (AES256 cipher)

You can decrypt the archive with the following command:

gpg --decrypt nextcloud-data-2023-11-17-03-10-01.tar.gz > nextcloud-backup.tar.gz

When you are prompted for a passphrase, enter the one you set up when configuring the s3cmd command previously.

You can now extract the archive and see your data:

tar -xzvf nextcloud-backup.tar.gz

The archive will be extracted into the current directory.

Setting up a Digital Ocean droplet for a Lupo website with Terraform

Overview of this guide

My Terraform Repository used in this guide

Terraform is a program that enables you to set up all of your cloud-based infrastructure with configuration files. This is opposed to the traditional way of logging into a cloud provider’s dashboard and manually clicking buttons and setting up things yourself.

This is known as “Infrastructure as Code”.

It can be intimidating to get started, but my aim with this guide is to get you to the point of being able to deploy a single server on Digital Ocean, along with some surrounding items like a DNS A record and an ssh key for remote access.

This guide assumes that you have a Digital Ocean account and that you also have your domain and nameservers setup to point to Digital Ocean.

You can then build upon those foundations and work on building out your own desired infrastructures.

The Terraform Flow

As a brief outline, here is what will happen when working with terraform, and will hopefully give you a broad picture from which I can fill in the blanks below.

  • Firstly we write a configuration file that defines the infrastructure that we want.
  • Then we need to set up any access tokens, ssh keys and terraform variables. Basically anything that our Terraform configuration needs to be able to complete its task.
  • Finally we run the terraform plan command to test our infrastructure configuration, and then terraform apply to make it all live.

Installing the Terraform program

Terraform has installation instructions, but you may be able to find it with your package manager.

Here I am installing it on Arch Linux, by the way, with pacman

Bash
sudo pacman -S terraform

Setting the required variables

The configuration file for the infrastructure I am using requires only a single variable from outside. That is the do_token.

This is created manually in the API section of the Digital Ocean dashboard. Create yours and keep its value to hand for usage later.

Terraform accepts variables in a number of ways. I opt to save my tokens in my local password manager, and then use them when prompted by the terraform command. This is slightly more long-winding than just setting a terraform-specific env in your bashrc. However, I recently learned off rwxrob how much of a bad idea that is.

Creating an ssh key

In the main.tf file, I could have set the ssh public key path to my existing one. However, I thought I’d create a key pair specific for my website deployment.

Bash
ssh-keygen -t rsa

I give it a different name so as to not override my standard id_rsa one. I call it id_rsa.davidpeachme just so I know which is my website server one at a glance.

Describing your desired infrastructure with code

Terraform uses a declaritive language, as opposed to imperetive.

What this means for you, is that you write configuration files that describe the state that you want your infrastructure to be in. For example if you want a single server, you just add the server spec in your configuration and Terraform will work out how best to create it for you.

You dont need to be concerned with the nitty gritty of how it is achieved.

I have a real-life example that will show you exactly what a minimal configuration can look like.

Clone / fork the repository for my website server.

Explaination of my terraform repository

YAML
terraform {
  required_providers {
    digitalocean = {
      source = "digitalocean/digitalocean"
      version = "~> 2.0"
    }
  }
}

variable "do_token" {}

# Variables whose values are defined in ./terraform.tfvars
variable "domain_name" {}
variable "droplet_image" {}
variable "droplet_name" {}
variable "droplet_region" {}
variable "droplet_size" {}
variable "ssh_key_name" {}
variable "ssh_local_path" {}

provider "digitalocean" {
  token = var.do_token
}

The first block tells terraform which providers I want to use. Providers are essentially the third-party APIs that I am going to interact with.

Since I’m only creating a Digital Ocean droplet, and a couple of surrounding resources, I only need the digitalocean/digitalocean provider.

The second block above tells terraform that it should expect – and require – a single variable to be able to run. This is the Digital Ocean Access Token that was obtained above in the previous section, from the Digital Ocean dashboard.

Following that are the variables that I have defined myself in the ./terraform.tfvars file. That tfvars file would normally be kept out of a public repository. However, I kept it in so that you could hopefully just fork my repo and change those values for your own usage.

The bottom block is the setting up of the provider. Basically just passing the access token into the provider so that it can perform the necessary API calls it needs to.

YAML
resource "digitalocean_ssh_key" "ssh_key" {
  name       = var.ssh_key_name
  public_key = file(var.ssh_local_path)
}

Here is the first resource that I am telling terraform to create. Its taking a public key on my local filesystem and sending it to Digital Ocean.

This is needed for ssh access to the server once it is ready. However, it is added to the root account on the server.

I use Ansible for setting up the server with the required programs once Terraform has built it. So this ssh key is actually used by Ansible to gain access to do its thing.

I will have a separate guide soon on how I use ansible to set my server up ready to host my static website.

YAML
resource "digitalocean_droplet" "droplet" {
  image    = var.droplet_image
  name     = var.droplet_name
  region   = var.droplet_region
  size     = var.droplet_size
  ssh_keys = [digitalocean_ssh_key.ssh_key.fingerprint]
}

Here is the meat of the infrastructure – the droplet itself. I am telling it what operating system image I want to use; what size and region I want; and am telling it to make use of the ssh key I added in the previous block.

YAML
data "digitalocean_domain" "domain" {
  name = var.domain_name
}

This block is a little different. Here I am using the data property to grab information about something that already exists in my Digital Ocean account.

I have already set up my domain in Digital Ocean’s networking area.

This is the overarching domain itself – not the specific A record that will point to the server.

The reason i’m doing it this way, is because I have got mailbox settings and TXT records that are working, so i dont want them to be potentially torn down and re-created with the rest of my infrastructure if I ever run terraform destroy.

YAML
resource "digitalocean_record" "record" {
  domain = data.digitalocean_domain.domain.id
  type   = "A"
  name   = "@"
  ttl    = 60
  value  = "${digitalocean_droplet.droplet.ipv4_address}"
}

The final block creates the actual A record with my existing domain settings.

It uses the domain id given back by the data block i defined above, and the ip address of the created droplet for the A record value.

Testing and Running the config to create the infrastructure

If you now go into the root of your terraform project and run the following command, you should see it displays a write up of what it intends to create:

Bash
terraform plan

If the output looks okay to you, then type the following command and enter “yes” when it asks you:

Bash
terraform apply

This should create the three items of infrastructure we have defined.

Next Step

Next we need to set that server up with the required software needed to run a static html website.

I will be doing this with a program called Ansible.

I’ll be writing up those steps in a zet very soon.