As an example Lz4 and zstd also have a compressBound() function that calculates this.
What integer patterns does it do well on, and what patterns does it do poorly on?
How many strategies does it support? It only mentions delta which is not compression. Huffman, RLE, variable-length encoding ...
Does it really just "give up" at C/1024 compression if your input is a gigabyte of zeros?
It only does delta and bitpacking now.
It should do fairly well for a bunch of zeroes because it does bitpacking.
I’m working on adding rle/ffor and also clarifying the strategy and making it flexible to modify the format internally so it won’t break API
Good compression algorithms effectively use the same storage for highly-redundant data (not limited to all zeros or even all the same single word, though all zeros can sometimes be a bit smaller), whether it's 1 kiloword or 1 gigaword (there might be a couple bytes difference since they need to specify a longer variable-size integer).
And this does not require giving up on random-access if you care about that - you can just separately include an "extent table" (works for large regular repeats - you will have to detect this anyway for other compression strategies, which normally give up on random-access), or (for small repeats only) use strides, or ...
For reference, BTRFS uses 128KiB chunks for its compression to support mmap and seeking. Of course, the caller should make sure to keep decompressed chunks in cache.
1024 for block size is just for being able to vectorize delta encoding and bit packing.
I am using this library for compressing individual pages of columns in a file format so the page size will be determined there.
I’m not using fastlanes to do in-memory compressed arrays like it is originally intended for. But I’ll export the fastlanes API in next version too, so someone can implement it themselves if needed