Re: Painlessly shrinking kernel messages (Re: kernel support for

Timothy Miller (miller@techsource.com)
Thu, 10 Apr 2003 19:58:46 -0400


Alan Cox wrote:

>Not a totally crazy idea. You could also do 5pack and some of the other
>string tricks people have used in time. You also dont need to do word
>boundaries.
>
My google search for '5pack' didn't come up with anything relevant.
Things that come to mind include converting to a character set which
requires fewer than 8 bits per character and then packing them into
bytes. Or perhaps making a list of every quintuplet of characters that
ever occurs and assign them codes.

I initially considered the idea of ignoring word boundaries. I rejected
it because part of the "painless" factor would be that it could be done
manually without a lot of thinking. But I will run a test which ignores
word boundaries and see what kinds of results I get. Of course, if we
want to do something that involves some post-compile magic or whatnot,
then we can do all sorts of gnarley tricks. But that doesn't differ (in
complexity) much from the idea someone else mentioned which was to
completely remove all messages from the kernel by magically converting
them to numbers or hashes and then decoding them outside of the kernel.

There was mentioned a valid point that boot messages need to be handled
properly by the kernel before any services are up. Separating the boot
messages from the non-boot messages would require manual intervention
that goes against the painless factor, and is the pie slice containing
only non-boot messages large enough that it's worth it? There seem to
be quite a lot of boot messages that could benefit from some sort of
completely-in-kernel compression.

>
>For embedded at least this is far from ludicrous as a concept. The
>tricky piece for all of these is working out how to grab each printk
>format string and do things to it. That lets you do compression,
>removal, internationalisation, cataloguing ..
>
>

Hmmm...
- Make gcc produce assember output
- Find all calls to prink
- Cross-reference those against all static strings
- Compress the strings
- Run through gas, etc.

The problem with this approach is that we have to deal with different
architectures. The plus is that any unsupported arch just doesn't run
the compression tool and uses regular printk.

How about:
- Use perl or yacc or something to parse the kernel source for strings
- Compress them
- Make the substitutions inline in the source as part of the
pre-processing stage
- Compile

Heck, we could just embed this functionality directly into the
preprocessor. Unfortunately, this one is somewhat beyond my current
knowledge of the tools that would make it convenient.

Just as a note, I worked on my test program to make it a more accurate.
For 128 codes, the actual reduction is 38946 bytes. For this
algorithm, I look to see if any of the shorter words are contained in
any of the larger ones; in the case where the shorter word's
substitution would shrink the kernel more than the larger, I add the
larger word's count to the smaller and delete the larger.

If we were to outlaw some of the lower characters, such as most
non-printing characters and all lower-case, then that brings us up to
having 184 codes to work with. That lets us save 42692 bytes. If we
were to go to two-character codes, where the first one is 128-255 and
the second is 1-255, that brings the number of codes up to 32640. It
turns out that, with my current algorithm, it doesn't buy anything, and
it also violates the painless factor by giving people a huge list of
words they have to pick from when writing kernel messages. Also, it
turns out that there are only just over 500 different words which would
save more than 2 bytes by being encoded.

I need to get a LOT more clever about this before it's worth doing.
I'll try the no-word-boundaries approach. And we'll see how interested
other people are in having to DEAL with it.

BTW, should I faint or something because THE Alan Cox responded to my
first post to lkml? :)
You hate it when people say that sort of thing, don't you. :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/