Transparent Bucket Layout

Dropbear writes plain, documented objects to a plain S3-compatible bucket. Nothing is encrypted at the application layer, nothing is packed into proprietary containers, and nothing depends on Dropbear being installed to read. If the project disappears tomorrow, your data is still there in formats you can parse with a Python script and a weekend.

This is a deliberate trade. Dropbear is a tool for your own bytes, and "your own bytes" includes the right to walk away from the tool.

What's actually in the bucket

Everything for a root lives under roots/<rootID>/:

roots/<rootID>/
├── blobs/<ab>/<cd>/<sha256>              file contents, content-addressed (sharded by hex prefix)
└── devices/
    ├── registry.json                     active + retired devices for this root
    └── <deviceID>/
        ├── head.json                     pointer to this device's latest manifest
        ├── manifests/<ts>.json           full snapshots, one per sync
        └── tombstones/<seq>.json         explicit delete records (per-device monotonic seq)

That's the whole protocol. One blobs prefix (raw bytes keyed by hash, sharded to keep S3 listings happy) plus one per-device subtree carrying everything attributable to that device. No sidecar databases in the bucket, no per-file metadata objects, no opaque chunk catalog.

A manifest is a JSON document listing every tracked path with its mode, size, sha256, symlink target, and so on. A head is a one-line JSON pointer naming the manifest. A tombstone is a JSON record describing what was deleted and when. You can aws s3 cp any of them to your terminal and read them with your eyes.

What this gets you

  • Other tools work. rclone, aws s3, mc, s5cmd, the web console — any S3 client can list, download, back up, mirror, or cross-region-replicate a Dropbear bucket without knowing Dropbear exists. Want offsite backup of just the blobs? rclone sync the blobs/ prefix to a second provider. Done.
  • Recovery without Dropbear. Given the bucket, a determined human (or a fifty-line script) can rebuild a root by reading a head, fetching the manifest it names, and for each entry pulling blobs/<sha256> to the recorded path. The manifest format is documented and stable; the SCHEMA.md in the repo is the authoritative reference.
  • Inspection is trivial. aws s3 ls roots/photos/devices/ tells you which devices have ever participated. aws s3 cp roots/photos/devices/laptop/head.json - tells you that device's current manifest id. Reading that manifest tells you exactly what laptop thinks the world looks like. No daemon to query, no opaque state to interpret. Retiring a device is a single recursive delete under devices/<deviceID>/.
  • Bring-your-own-everything. Server-side encryption is the bucket's problem (set it on the bucket; Dropbear doesn't care). Lifecycle rules are the bucket's problem. Object lock, versioning, access control — all configured where you'd expect them, against a layout that doesn't fight you.
  • Multi-tenant for free. A single bucket can host many roots side-by-side under different <rootID> prefixes, each fully isolated. Share a bucket between projects, between users (with prefix-scoped IAM), or just between unrelated chunks of your own data.

What it costs

Transparency means no client-side encryption today. The bucket operator can read your bytes. If you don't trust them, use bucket-level SSE with a customer-managed key, or wait for the client-side encryption idea to ship. Either way, the choice is explicit and visible — Dropbear isn't pretending to protect you with a layer it doesn't have.

It also means the layout is a public contract. We can extend it (new prefixes, new optional fields in manifest entries) but breaking changes cost a lot, because real users have real scripts pointed at the existing shape. This is fine — it's the cost of being honest about the format — but it slows down certain redesigns.

And it means no deduplication or compression beyond the blob layer. Blobs are stored as-is. If you want zstd on top, do it at the storage layer (some providers offer this) or layer a transformation in yourself — but at that point you've broken the "any S3 client can read it" promise, so think hard first.

The escape hatch in practice

If you decide Dropbear isn't for you anymore:

  1. Your files are already on disk, in their original layout. The bucket is the copy. You can stop using Dropbear right now and lose nothing.
  2. If you don't have a local copy and only have the bucket, read any device's head.json, fetch the manifest it names, and reconstruct the tree by pulling blobs/<ab>/<cd>/<sha256> for each entry. The format is documented; a one-evening script will do it.
  3. If you want to migrate the bucket to another tool, every other S3-compatible sync tool can read your blobs as raw objects. You'll lose the manifest semantics, but you won't lose the content.

This is the inverse of the "lock you in so leaving is painful" pattern. Leaving Dropbear is the easiest possible operation: stop running it.

See also