Re: Re: Swap Compression

Jörn Engel (joern@wohnheim.fh-wedel.de)
Sun, 27 Apr 2003 19:51:47 +0200


On Sun, 27 April 2003 13:24:37 -0400, rmoser wrote:
> >int fox_compress(unsigned char *input, unsigned char *output,
> > uint32_t inlen, uint32_t *outlen);
> >
> >int fox_decompress(unsigned char *input, unsigned char *output,
> > uint32_t inlen, uint32_t *outlen);
>
> Ey? uint32_t*? I assume that's a mistake....

Nope. outlen is changed, you need a pointer here.

> Anyhow, this wasn't what
> I was asking. What I was asking was about how to determine how much
> data to send to compress it. Read the message again, the whole thing
> this time.

I did. But modularity is the key here. The whole idea may be great or
plain bullshit, depending on the benchmarks. Which one it is depends
on the compression algorithm used, among other things. Maybe your
compression algo is better for some machines, zlib for others, etc.

Why should someone decide on an algorithm before measuring?

> >Then the mm code can pick any useful size for compression.
>
> Eh? I rather the code alloc space itself and do all its own handling. That
> way you don't have to second-guess the buffer size for decompressed
> data if you don't do all-at-once decompression (i.e. decompressing x86
> segments, all-at-once would be to decompress the whole compressed
> block of N size to 64k, where partial would be to pull in N/n at a time and
> decompress in n sets of N/n, which would give varying sized output).

Segments are old, stupid and x86 only. What you want is a number of
pages, maybe just one at a time. Always compress chunks of the same
size and you don't have to guess the decompressed size.

> Another thing is that the code isn't made to strictly stick to compressing
> or decompressing a whole input all at once; you may break down the
> input into smaller pieces. Therefore it does need to think about how much
> it's gonna actually tell you to pull off when you inquire about the size to
> remove from the stream (for compression at least), because you might
> break it if you pull too much data off midway through compression. The
> advantage of this method is in when you're A) Compressing files, and
> B) trying to do this kind of thing on EXTREMELY low RAM systems,
> where you can't afford to pass whole buffers back and forth. (Think 4 meg)

a) Even with 4M, two pages of 4k each don't hurt that much. If they
do, the whole compression trick doesn't pay off at all.
b) Compression ratios usually suck with small buffers.

> You do actually understand the code, right? I have this weird habit of
> making code and doing such obfuscating comments that people go
> "WTF is this?" when they see it. Then again, you're probably about
> 12 classes past me in C programming, so maybe it's just my logic that's
> flawed. :)

I didn't look that far yet. What you could do, is read through
/usr/src/linux/Documentation/CodingStyle. It is just so much nicer
(and faster) to read and sticking to it usually produces better code.

Jörn

-- 
Beware of bugs in the above code; I have only proved it correct, but
not tried it.
-- Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/