Backup and Restore Strategy

Backup and restore: recovery as insurance against data loss

A backup and restore strategy is the discipline that defines which data is secured how often, onto which media and for how long, and that proves a working system can be rebuilt from it. A backup only becomes one once the restore has been tested.

Almost every organisation backs up its data. Far fewer know whether a functioning system can actually be rebuilt from those backups when it matters, in what time and with how much data loss. That is exactly where a tool ends and a strategy begins. A backup program runs overnight and reports success; the strategy decides whether that success is worth anything in a crisis. This page describes the four pillars of the discipline, why the tested restore is the only pillar that counts, and where the practice regularly breaks.

The four pillars

A backup strategy rests on four decisions, each of them measurable on its own:

The 3-2-1 rule. Three copies of the data, on two different media types, one of them at an external, physically separate location. The rule is deliberately simple because it protects against three independent failure modes at once: a corrupt original, a failed medium and a local disaster.
RPO, the tolerable data loss. The Recovery Point Objective defines how much work may be lost in the worst case, for example one hour. The backup frequency follows directly: an RPO of one hour demands at least hourly backups.
RTO, the tolerable downtime. The Recovery Time Objective defines how quickly a system must be running again. It dictates how quickly the backup must be readable and how automated the recovery path must be.
Retention. How long is each backup tier kept? Retention rules, often a generational scheme with daily, weekly and monthly states, decide how far back a restore can reach. With data corruption or ransomware discovered late, that is the difference between available and lost.

These four decisions are not a technical detail but a business call. What an hour of downtime costs versus what a faster recovery costs is a trade-off that belongs to the broader Disaster Recovery discipline. Backup supplies the data there, disaster recovery supplies the plan that turns it back into a whole system.

The tested restore

The core of the discipline is a single statement: a backup that has never been successfully restored effectively does not exist. Backup jobs report success when bytes have been written, not when those bytes later yield a booting system. Between the two sit silent failure sources: an inconsistent database snapshot, a missing encryption key, a backup format today's software no longer reads, or simply a dependency no one included. Only a trial restore exposes these gaps before the real incident does.

The trial is meaningful only when it restores into an isolated environment and checks the result against measurable criteria: does the system boot, is the data consistent, and does the measured recovery time fall within the RTO? The cadence of these tests depends on criticality, regulatory requirements and the system's rate of change; for core systems a quarterly trial is common. Proving the restore through automation on top of that approaches the extended 3-2-1-1-0 variant, which adds one immutable copy and zero recovery errors.

Protection against ransomware

Modern ransomware deliberately hunts the backups, because a deleted backup forces the ransom. The classic 3-2-1 rule alone is not enough when all three copies are online and writable. Two additions close the gap:

Immutable backups. A copy configured as immutable cannot be deleted or overwritten within its retention window by a compromised normal administrator or backup account, provided immutability is correctly configured. This depends on the right configuration; read-only storage alone is not sufficient.
Air gap. A physically or logically separated copy that the running attack cannot reach, whether a disconnected tape or a separate account with its own credentials.

Both complement, but do not replace, the tested restore: an immutable backup that cannot be restored protects against deletion, not against data loss. This line belongs to the broader Security Strategy and to the Zero Trust assumption that internal accounts can be compromised too.

The lifecycle of a backup

Backup is not a one-off job but a cycle of defining, securing, verifying and, when it matters, restoring. Only the feedback from test and restore into the next definition turns backups into a strategy:

flowchart TD
    A["Define<br/>RPO, RTO, retention"] --> B["Secure<br/>3-2-1, immutable"]
    B --> C["Verify<br/>restore in isolation"]
    C --> D["Assess<br/>RTO met? data consistent?"]
    D --> E["Restore<br/>in a real incident, by priority"]
    D --> A
    E --> A

The decisive arrow runs from the assessment back to the definition. If the test misses the RTO or shows inconsistencies, the data is not the problem, the strategy is, and it gets adjusted before the real incident tests it. In the incident itself, recovery proceeds by business priority, not alphabetically: first the systems whose outage is most expensive.

Where the practice breaks

Untested backups. The job runs green, no one has ever restored. This is the most common and most expensive mistake, because it only surfaces in a crisis.
All copies online. Three copies on the same reachable storage are, against ransomware, effectively one copy. Only the separate, immutable tier closes the gap.
RPO and RTO never defined. Without target values any backup frequency is arbitrary and any recovery is rudderless. The targets come from the business, not from the IT department.
Retention too short. When a corruption or encryption is discovered only weeks later, states kept too briefly have long been overwritten.
Restore not observed. Without telemetry on the success, duration and completeness of backups, the strategy stays a claim. That view comes from the Observability of the backup pipeline.

Tools are interchangeable, the discipline is not

Which program writes the bytes is secondary. A simple file sync with rsync, a folder mirror over Syncthing, or full backup software such as Duplicati with versioning and encryption each solve part of the task. None of them replaces the strategy behind it. Syncthing, for instance, mirrors changes immediately, including an accidental deletion or an encryption by ransomware, and is therefore not a backup in the sense of this discipline without a versioned, separated tier. The tool question only arises once RPO, RTO, retention and the restore test are settled.

In day-to-day operations, the backup belongs less to the software than to operational discipline, which Modern Service Management anchors as a regular, measured process. The end-to-end view of the pipeline's success, duration and restore capability is covered by the Observability and Telemetry service. When the incident hits, recovery feeds into the ordered flow of Incident Response.

References

Backblaze Data Backup Strategies, Why the 3-2-1 Backup Strategy is the Best. Derivation of the 3-2-1 rule and its limits with current media examples. (23.05.2024). www.backblaze.com/blog/the-3-2-1-backup-strategy/
Veeam What is the 3-2-1 Backup Rule. Fundamentals of the 3-2-1 rule and the extended 3-2-1-1-0 variant with one immutable copy and zero recovery errors. (05.02.2024). www.veeam.com/blog/321-backup-rule.html
ISO 22301:2019, Business Continuity Management Systems. International standard for business continuity that anchors recovery objectives and tested preparedness organisationally. (2019). www.iso.org/standard/75154.html
NIST SP 800-34 Rev. 1, Contingency Planning Guide for Federal Information Systems. Official guide to IT contingency planning covering backup, recovery and testing strategies. (2010). csrc.nist.gov/pubs/sp/800/34/r1/upd1/final

Ask AI

These links open external AI services, the conversation and its content are sent to their providers.