Dropbear stores file contents in the bucket keyed by their SHA-256 hash, not by their path. The path-to-content mapping lives in the [manifest](/glossary#manifest); the [blob](/glossary#blob) itself is just bytes at `roots//blobs/`. That single design choice buys deduplication for free and gives every download a built-in integrity check. ## How it works When `sync` decides a local file needs uploading, it hashes the file, checks whether `blobs/` already exists in the bucket, and skips the upload if it does. Two devices that both hold the same 4 GB ISO upload it exactly once. Rename a 100 MB video from `vacation/clip.mov` to `archive/2025/clip.mov` and the next sync rewrites the manifest entry but does not re-upload the bytes — only the manifest (small JSON) and the [head](/glossary#head) (tiny pointer) change. Same story across files: ten copies of the same PDF scattered across a root share one blob. Same story across **roots** that share a bucket prefix scheme: identical contents are stored once. (We don't currently dedup across `root_id` prefixes — each root lives under its own `roots//blobs/` namespace — but within a root, dedup is total.) ## What it gets you - **Renames and moves are free.** They're a manifest edit, nothing more. - **Restore is cheap when devices overlap.** Bootstrapping a second laptop from the same bucket only pulls blobs that aren't already on disk-by-hash if you've staged them locally, and never pulls the same blob twice in one restore. - **Integrity on download is automatic.** The key *is* the hash. After fetching `blobs/`, Dropbear re-hashes the bytes and refuses to write the file out if they don't match. Bit-rot in the bucket, a truncated transfer, or a malicious mid-flight swap all fail loudly. - **Backups are obvious.** Copying `roots//blobs/` to a second bucket gives you a complete, verifiable archive of the content. Manifests reference it by hash; nothing else is needed. ## What it costs Content addressing doesn't garbage-collect itself. When a file is deleted or modified, the old blob stays in the bucket — possibly forever — until a GC pass walks every reachable manifest, builds the live-blob set, and deletes the rest with a safety delay. Dropbear doesn't ship GC yet (see the [garbage-collection idea](https://git.tfks.net/tfks/dropbear/wiki/ideas/garbage-collection) in the wiki). For now, the bucket is a strict superset of what's reachable, and the bill follows. Whole-file hashing also means a one-byte edit to a 10 GB file uploads 10 GB of new blob. Chunked hashing is on the roadmap but not implemented — see the [chunked-large-files idea](https://git.tfks.net/tfks/dropbear/wiki/ideas/chunked-large-files). If your workload is "tiny edits to huge files," Dropbear in its current shape is the wrong tool. ## Worth knowing - The hash is the *content* hash, not a hash of the filename or any metadata. A file's mode, mtime, owner, and path are all manifest-side concerns. - Symlinks aren't blobs. Their target string lives in the manifest entry directly; there's nothing to deduplicate. - Empty files have a well-defined SHA-256 (`e3b0c44...`) and that blob exists exactly once in any non-trivial root. This is fine and expected.