I wrote BTRFS block level de-duplication sometime in 2020 as use-case for a patch sent in 2015 https://stackoverflow.com/a/34163236 and now after 6 more years created an use-case for dduper via pgdedup!

How it works? Consecutive pg_basebackup snapshots share most of their blocks. Store them uncompressed on BTRFS and let dduper deduplicate it.

Interestingly: - gzip completely breaks block-level dedup. Two pg_basebackup -z runs of the same database produce < 1% matching blocks. - Chunk size matters hugely. dduper's default 128KB chunks only found 19% savings. Lowering to 8KB (PostgreSQL's page size) jumped to 68%.