Backing Up Running Databases Without Stopping Them

Once you self-host a few services with live databases, the backup question stops being theoretical. A Postgres or SQLite file half-written when tar reads it goes into the archive in a state nothing on Earth will replay; you just don’t find out until the restore. Two years in, with multiple incidents I had to actually recover from (including the photos behind the e-ink frame), I trust this stack precisely because the correctness argument is short: BTRFS gives me an atomic snapshot, and everything above it can be a shell script. One Alpine container, ~75 lines of Bash, pushes that snapshot to one or more Borg repositories on a fixed interval. Multi-target is numeric env vars (BORG_REPO_0, BORG_REPO_1, …). No config format, no DSL; the env file is the configuration.

The problem the snapshot solves

I self-host several databases that are mid-write at every moment of the day. tar | borg create against the live volume is a race: a Postgres or SQLite file that’s half-written when borg reads it goes into the archive in a state nothing on Earth can replay. The “right” answer is to coordinate a quiesce with every database: a fan-out of pg_dump, SQLite .backup, Redis BGSAVE, and so on, all with retry, timeouts, and per-app credentials.

The cheaper answer, if you’ve put everything on one BTRFS volume, is btrfs subvolume snapshot. It returns instantly with a copy-on-write fork of the entire filesystem. Every file is now atomically consistent at exactly the same instant. Run borg against the snapshot, not against the live volume.

btrfs subvolume snapshot /btrfs-root /snapshot
cd "/snapshot/btrfs-root${BACKUP_RELATIVE_PATH:-}"
borg create ... ::"{hostname}-{now:%Y-%m-%dT%H:%M:%S}" .

The snapshot lives only for the duration of the borg run. A trap cleanup EXIT deletes the subvolume whether the backup succeeded, failed, or was killed. The next run snapshots fresh.

This shifts the entire correctness argument from “did I quiesce every database in time” to “does BTRFS give me a consistent snapshot.” It does. That’s why everything below it can be a shell script.

Multi-target as numeric env vars

The 3-2-1 backup rule wants three copies, two media, one offsite. My answer is a remote (rsync.net) and a local HDD, both fed from the same snapshot. The wire format for “multiple targets” is just numbered env vars:

BORG_PASSPHRASE_0=...
BORG_REMOTE_PATH_0=borg1
BORG_REPO_0=username@username.rsync.net:~/backup

BORG_PASSPHRASE_1=...
BORG_REPO_1=/local-backup

backup-wrapper.sh loops index=0 upward, exports BORG_PASSPHRASE / BORG_REPO / BORG_REMOTE_PATH from the indexed copies, runs backup.sh, unsets them, increments. Stops the first time the next index has no passphrase.

There’s also a no-index fallback (BORG_REPO=... with no number) for the single-target case. Same script, no extra config plane.

I keep coming back to this pattern for small-system orchestration. The env file is the data structure. There’s no YAML parsing, no JSON schema, no config-validation layer between you and the variable that actually matters.

The scheduler is a sleep, not cron

while true; do
    /src/backup-wrapper.sh 2>&1 | log_message
    sleep "$SLEEP_TIME"
done

A comment in the file says it out loud: “Using a simple sleep loop to schedule backups instead of cron to avoid concurrency issues.” Cron with a one-hour cadence and a backup that occasionally takes 70 minutes will eventually overlap itself. The sleep-loop can’t: the next run starts when the previous one is done, plus the interval. One process, one snapshot, one borg invocation. Concurrency bugs you can’t have are concurrency bugs you don’t have.

Healthcheck is a file mtime

borg create succeeded? Write date > /health/backup_completion_time.log. The Docker healthcheck shells out every 10 seconds and compares that mtime against MAX_BACKUP_AGE_SECONDS (default 86400). Older than that, container is unhealthy and whatever’s watching containers (in my case a notification hook) finds out.

Two subtleties worth naming:

First-boot grace period. If backup_completion_time.log doesn’t exist yet (fresh container, first backup still running), fall back to container_start_time.log so the container isn’t reported unhealthy during the first scheduled run.
Partial success is not success. In multi-target mode, the completion log is only written if every target succeeded. One repo failing means the healthcheck stays red even if the other two are fine. Stale-but-quiet was the failure mode I wanted to make impossible.

Smaller calls

borg break-lock at the start of every run. If the previous container was killed mid-backup, the repo is locked and the next borg create will hang. Just break it. There’s only ever one writer because of the sleep loop.
set -e after borg init, not before. The init line is the only one allowed to fail (first run on a fresh repo). Everything after halts on error.
BORG_RSH='ssh -oBatchMode=yes'. Fail fast if SSH would have prompted, instead of hanging forever inside a detached container.
ServerAliveInterval 30 in ssh_config. Long borg transfers across home-ISP NAT get killed if nothing flows for a few minutes. Keepalives keep the tunnel open.
--files-cache=ctime,size,inode. The default mtime,size,inode re-hashes files when their mtime changes; on BTRFS, ctime is the more honest signal of “this content actually changed.”
compression=zstd,12. The sweet spot for backup data on my hardware: substantially better than zlib, not so slow it dominates the run.
borg compact --threshold=5 --cleanup-commits. Reclaims space from pruned archives whenever the segment-file fragmentation crosses 5%.
IGNORE_GIT_UNTRACKED=true. Optional. Walks every .git dir under the snapshot, runs git ls-files --others --exclude-standard, and feeds the result into --exclude-from. Skips target/, node_modules/, build caches; anything the repo already knows isn’t worth keeping.
SYS_ADMIN capability on the container. Needed for btrfs subvolume snapshot and delete from inside the namespace. The narrower capability set didn’t have a way through.

What I’d change

A test rig that restores into an empty volume on a schedule. “Backups exist” is not the property I care about. “Backups restore” is. I have anecdotal evidence after every incident; I don’t have a green checkmark before one.
A failure notifier separate from the healthcheck. Docker healthcheck-unhealthy is one signal; I’d also want an explicit push (ntfy, email, Telegram) on first failure of a run, so I don’t have to be watching the container state.
Parallel targets when network and disk don’t compete. The current loop is strictly sequential: rsync.net then local HDD. They share neither bandwidth nor spindles; they could run in parallel and halve the wall-clock. Sequential made the wrapper trivial; the trade was knowable and I made it.

Two years in, the part I’d defend hardest is the snapshot. Everything above it is a wrapper that could be rewritten in an afternoon. The snapshot is what makes the wrapper allowed to be one.

The problem the snapshot solves

Multi-target as numeric env vars

The scheduler is a sleep, not cron

Healthcheck is a file mtime

Smaller calls

What I’d change

Related articles

An Obsidian Sync Built Around the Merger I Already Had

25 Million UK Property Rows in a Single Rust Process

An E-Ink Photo Frame That Sleeps When the House Is Empty