rsync
rsync is the classic tool for mirroring and transferring files. It compares source and destination and, for the usual transfers over the network, transfers only the changed parts instead of the whole files; for local copies and with --whole-file it writes the file in full instead. That makes it the transfer primitive behind many backup routines. The honest distinction is: rsync on its own is not a versioned backup.
Anyone wanting to keep large amounts of data current between two machines usually copies them in full over and over, or trusts a third-party cloud service. rsync solves the other half of the problem: it keeps two directories aligned by sending only the differences over the wire after the first run. That frugality is exactly what has made it a standard building block beneath backup, mirroring and distribution scripts for decades. This page describes how the delta transfer works, where rsync is the right building block, and why it complements a thought-through backup and restore strategy but does not replace it.
The principle
rsync aligns a source with a destination so that the destination afterwards matches the state of the source. Source and destination can live on the same machine, or the destination sits on a remote host that rsync usually reaches over SSH. The traffic then runs encrypted through the same secure channel as an SSH session.
- Differences only. The first run transfers everything. On every further run rsync compares the files and sends only what has changed. That saves bandwidth and time.
- Preserved properties. With the umbrella option
--archive(-a), permissions, timestamps, symlinks and the directory structure are preserved. It is shorthand for several individual options and covers almost everything, but it deliberately leaves out ACLs, extended attributes and hardlinks, which need their own switches. - Trial run. With
--dry-run, rsync shows what it would do without changing anything. For a mirror with deletions, that is the cheapest insurance against a misstep.
rsync is free software under the GNU General Public License Version 3 (GNU GPLv3). The code is open, the mechanism is auditable, and no vendor lock-in to a transfer service arises.
The delta transfer
The heart of it is the delta-transfer algorithm that Andrew Tridgell described in his doctoral thesis. It ensures that, even within a single, partially changed file, rsync transfers only the altered blocks instead of resending the file as a whole. Three roles share the work:
flowchart TD
A["Generator at the destination<br/>builds checksums per block"] --> B["Checksums to the sender<br/>metadata only over the wire"]
B --> C["Sender compares<br/>rolling checksum, block by block"]
C --> D["Sender sends<br/>changed blocks plus instructions"]
D --> E["Receiver reassembles<br/>matching blocks locally, new ones from the stream"]
The generator at the destination builds checksums over the blocks of the file already there and sends them to the sender. The sender reads the source file with a rolling checksum and looks for blocks the destination already knows. What matches is named only as a reference; what is new goes over the wire as raw data. The receiver rebuilds the file from the locally present blocks and the few new bytes. So only what has actually changed travels across the network.
Mirroring, not versioning
This is the important distinction. In normal operation rsync mirrors the current state of the source onto the destination. With the --delete option it also removes everything at the destination that no longer exists in the source. That is precisely what a mirror is for, and at the same time dangerous: a file accidentally deleted or corrupted in the source will, on the next run, be deleted or overwritten at the destination too. A pure mirror protects against the failure of a storage medium, not against accidental deletion or corruption.
So rsync on its own keeps no version history. There is no built-in catalogue of earlier states from which yesterday's version could be restored. To build a real backup with rsync, it is combined with a second building block that supplies the timeline. A common one is the --link-dest option: it points to a previous backup run and stores unchanged files as a hardlink rather than a copy. That produces many dated snapshots which together take up only a little more space than a single copy. It is a proven but self-built pattern, not a finished product with an interface, encryption and restore logic. A dedicated backup tool such as Duplicati delivers exactly that layer out of the box.
Where rsync fits and where it does not
- Fits. Keeping large amounts of data current over the network frugally, mirroring servers, transferring directories before a migration, and serving as the transfer layer beneath in-house backup scripts. Because of the delta mechanism, a repeated run over a thin link is cheap.
- Fits with care. As a mirror with
--delete, rsync is sharp. Without a prior--dry-runand without a separate copy preserving an older state, a wrongly set path can delete data at the destination. - Does not fit as a backup on its own. Without
--link-destor a tool layered on top, versioning is missing. Sync and mirroring are not the same as a recoverable, versioned backup. - Does not fit as continuous sync. rsync runs as an invocation, not as a service. Anyone wanting to keep folders continuously in sync in both directions reaches rather for Syncthing; rsync usually aligns in one direction and on demand.
The building block in the backup picture
rsync is rarely the whole solution and almost always a part of it. The clean separation of mirroring, synchronisation and a real, versioned backup is handled by the backup and restore strategy: mirroring keeps a second copy current, sync aligns devices, and only a versioned backup allows restoring a state from the day before yesterday. rsync occupies the first place and often supplies the transport layer for the third. A continuous device sync, for example with Syncthing, and an encrypted, versioned backup, for example with Duplicati, cover the other places.
Whether such a data path is run in-house or bought as a service, and whether the data stays under the organisation's own control and lands on foreign machines only encrypted, are the trade-offs that decide how the path is built. Run on in-house infrastructure, rsync is a building block of sovereign infrastructure, because the data never leaves the organisation's own domain.
References
- rsync Version 3.4.4. Current stable version, a regression-fix release on top of the 3.4.3 security release, free software under the GNU GPLv3. (08.06.2026). rsync.samba.org/
- rsync How rsync works, a practical overview. Description of the generator, sender and receiver roles and the rolling-checksum mechanism. (2026). rsync.samba.org/how-rsync-works.html
- rsync Manual page rsync(1). Reference for the options, including
--archive,--delete,--link-destand--dry-run. (2026). download.samba.org/pub/rsync/rsync.1 - Andrew Tridgell Efficient Algorithms for Sorting and Synchronisation. The doctoral thesis that formally derives the rsync algorithm. (1999). www.samba.org/~tridge/phd_thesis.pdf
Related topics
- Backup and restore strategy, the separation of mirroring, sync and versioned backup.
- Digital Sovereignty, control over data and infrastructure without foreign servers.
Ask AI
These links open external AI services, the conversation and its content are sent to their providers.