Backups for AI agents: what to actually protect
Backing up an agent isn't 'tar /home and call it done.' The interesting parts are which directories matter, which ones must stay out, and how you actually restore.
Most servers can be re-provisioned from scratch in an hour. Stateless web apps, in particular: the database is the only thing you have to protect, and everything else is just code on Git.
An AI agent is different. Its long-term value is the state it accumulates: memory it's built up, sessions it remembers, platform pairings it's established, custom configuration it's accreted over months. Lose that and you don't lose "a weekend re-provisioning" — you lose the version of the agent that knows your business.
So backup design is a real design question, not boilerplate. Here's how we think about it.
What you have to keep
Inside the agent's home directory there's a small number of things that genuinely matter:
sessions/— the agent's conversation history. Replayable; sometimes the only record of a decision.memories/— what the agent has chosen to remember long-term. This is the "personality" that's grown over time.state.db— the SQLite store backing the agent's runtime state. Needs a WAL checkpoint before copying to avoid a torn read; we issue that before each backup.SOUL.md— the agent's persistent self-description. Tiny file, irreplaceable.- Other small per-platform pairing records, allowlists, etc.
What you must not keep
The directories that should never enter the backup set:
- The agent's git checkout and venv. Multi- gigabyte, fully reproducible from upstream. Backing it up is pure dead weight.
- Caches and runtime lock files.They're ephemeral. If anything they reproduce a bad state.
.envand the OAuth token.This is the one that bites people. Secrets in your backup set means a leaked backup is a leaked credential set. The secrets stay in the encrypted control vault; if you restore, you re-pair OAuth as part of the recovery procedure. It's a deliberate trade — a slightly more annoying restore for a far better security posture.
Where the backups live
Backing up to the same provider you're running on is asking for trouble: if the provider has a region-wide incident, your data and your backups both go dark at once. Our default is to ship the archives to a separate provider, with snapshot-based immutability turned on at the storage layer.
That gives you two layers of protection:
- The active borg repo, which holds your nightly archives with retention (7 daily, 4 weekly, 6 monthly). The box has push access to it.
- A read-only snapshot of that repo taken daily by the storage provider. The box cannotdelete those snapshots — even a fully-compromised agent can't destroy the backup history.
Restore is the part people skip
A backup that's never been restored isn't a backup; it's a hope. The restore procedure has to be:
- Scripted.Not "here's a runbook" — actually one command, with clearly-named arguments.
- Tested on a disposable box. Not the production tenant, ever. We provision a sandbox box, restore into it, and verify the agent comes up healthy. Drift in either direction shows up here.
- Documented end-to-end. Including the OAuth re-pairing step, which is the one most likely to be forgotten under pressure.
The boring lesson
The interesting part of backup design isn't the tool you pick (borg is great, restic is great, rsync-to-S3 can be made to work). It's the discipline of: knowing which bytes are precious, which bytes are dead weight, which bytes are dangerous to keep, where the copies live, and whether the restore path actually runs end-to-end under load.
For a managed deployment we treat that whole loop as table stakes. Nightly archives, immutable snapshots, monthly restore drills on sandbox boxes. The day you need it, that's the only thing that matters.
Want one of these for your business?
We run dedicated, hardened, monitored AI agents on your behalf — single-tenant, end-to-end.
Request access