Re: [RFC] Breaking data compatibility with userspace bzlib

Jeff Garzik (jgarzik@pobox.com)
Fri, 20 Jun 2003 15:09:57 -0400


On Fri, Jun 20, 2003 at 08:59:15PM +0200, J?rn Engel wrote:
> Now, the cost of the underlying BWT is O(n) in memory and O(n*ln(n))
> in time. That given, I consider it odd to use a linear semantic of
> blockSizeXXXX and would prefer an exponential one, as the zlib uses
> here and there. Thus blockSizeBits would now give the blockSize as
> 1 << blockSizeBits, allowing to go well below 100k, resulting in lower
> memory consumption for some and well above 900k, giving better
> compression ratios.
>
>
> Long intro, short question: Jay O Nay?

The big question is whether the bzip2 better compression is actually
useful in a kernel context? Patches to do bzip2 for initrd, for
example, have been around for ages:

http://gtf.org/garzik/kernel/files/initrd-bzip2-2.2.13-2.patch.gz

But the compression and decompression overhead is _much_ larger
than gzip. It was so huge for maximal compression that dialing back
compression reaching a point of diminishing returns rather quickly,
when compared to gzip memory usage and compression.

I talked a bit with the bzip2 author a while ago about memory usage.
He eventually added the capability to only require small blocks
for decompression (64K IIRC?), but there was a significant loss in
compression factor.

So... even in 2003, I really don't know of many (any?) tasks which
would benefit from bzip2, considering the additional memory and
cpu overhead.

Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/