3 Concepts: Information | Projects

Project II: De Compression Competition

Deadline(s)

1) February 21, 12:00
2) March 7, 12:00
3) March 14, 12:00
Final) March 24, 12:00

Results

R O U N D    1
  Decompressor  curve   genes   income    letters    pretty  TOTAL
1. Team Lyytinen 11'955 110'315 107'186 4'853 132'773 101'497 468'579
2. Team Heilimo 19'520 99'872 109'980 4'852 155'445 80'470 470'139
3. Team Halmetoja 11'746 102'907 108'005 4'851 148'550 112'740 488'799
4. Team Ahosola 13'044 112'234 107'734 4'422 148'565 125'017 511'016
5. Team Konttori 1'385 129'812 127'969 7'167 148'669 125'362 540'364
6. Team Lampi 482 288'894 439'885 281'370 251'502 295'299 1'557'432
7. Team Etelävuori 5'741 - - - - - -

No clear winner so far. Teams Lyytinen and Heilimo both start strong but with different strengths. Team Lyytinen manages to snag the victory of round 1, but the game is only beginning and I am sure team Lyytinen must continue to work hard to maintain their lead. Interestingly teams' decompression programs are much larger than in the previous competition, but this is understandable because larger datafiles give more leeway. Team Etelävuori did not return a working program for this round, and team Lukk did not return anything.

There will be no new data for round 2. I will study the results and your solutions and perhaps add more data to compress for round 3.

R O U N D    2
  Decompressor  curve   genes   income    letters    pretty  TOTAL
1. Team Ahosola 4'833 88'901 107'745 4'421 116'489 78'237 400'626
2. Team Heilimo 23'706 91'551 107'752 4'853 137'951 67'740 433'553
3. Team Lyytinen 15'552 110'315 107'186 4'853 132'773 72'953 443'632
4. Team Konttori 4'706 91'809 108'007 4'850 115'217 125'255 449'844
5. Team Halmetoja 14'054 102'907 108'005 4'851 148'550 87'162 465'529
6. Team Etelävuori 7'372 101'074 109'972 6'106 150'809 94'958 470'291

Our epic battle is clearly heating up. Most teams perform better this round than the best result of the previous round. Will the trend continue? Teams Heilimo and Lyytinen continue close each other but team Ahosola surprises them from behind with impressive overall performance and becomes the undisputed champion of round 2.

I must confess that I am a bit surprised by the great differences in the sizes of the decompression programs. Luckily it is time to add more data and more data of course means that decompressor size becomes less important. In the tar below you will find new datafiles markov and herring. After this there will probably be no more data. Will these new datafiles prove to be decisive? Time will tell.

R O U N D    3
  Dec.  curve   genes   herring   income    letters     markov    pretty  TOTAL
1. Team Lyytinen 17'857 110'315 107'186 70'634 4'853 132'773 53'906 72'953 570'477
2. Team Ahosola 5'357 88'901 107'745 102'579 4'421 116'489 167'050 77'181 669'723
3. Team Konttori 6'969 88'563 107'446 138'004 4'079 110'160 162'408 85'227 702'856
4. Team Heilimo 19'679 91'551 107'752 148'564 4'853 105'616 165'703 64'985 708'703
5. Team Halmetoja 21'546 91'812 108'005 138'008 4'851 110'343 156'732 87'162 718'459
6. Team Etelävuori 8'330 101'074 109'972 - 6'106 150'809 - 94'958 -

Team Lyytinen returns to lead, this time with much better result than all the rest of the teams. Team Lyytinen's position is far from secure however, as other teams make good progress on other fronts and if they manage to duplicate Team Lyytinen's success with the new datafiles they may easily surpass Team Lyytinen, or at least Team Lyytinen's current results. Maybe other teams just plan to give each other false sense of security before their final lunge. Tune in next week to find out the winner!

No completely working solution from Team Etelävuori this round.

F I N A L    R E S U L T S
  Dec.  curve   genes   herring   income    letters     markov    pretty  TOTAL
1. Team Ahosola 18'978 88'744 107'745 38'119 3'937 96'560 50'230 74'287 478'600
2. Team Halmetoja 15'156 91'607 108'005 71'396 4'851 110'343 54'682 69'816 525'855
3. Team Heilimo 14'106 91'551 107'752 93'795 4'853 104'840 53'814 63'209 533'920
4. Team Konttori 5'796 88'563 107'446 94'534 4'079 110'160 53'416 85'227 549'221
5. Team Lyytinen 17'851 110'315 107'186 70'634 4'853 132'773 53'906 72'953 570'471
6. Team Etelävuori 10'780 101'074 109'972 140'424 6'106 134'694 64'643 94'959 662'652

And the winner is... Team Ahosola! Team Ahosola has the best compression rate in most cases so there is no question of which team is the best this time. The results of the other teams are not too shabby either, as again almost everybody does better than the best result of the previous round. A honorable mention goes to Team Heilimo for the nice tricks they used in compressing pretty.dat.

Congratulations to everybody and particularly to the winners!

The task

Your task is to design, document, and implement a compressor and a decompressor program. All imaginable lossless compression methods are allowed. The compression rate of the program will be evaluated with a given set of data files. However, scoring will not be based solely on compression rate. Instead the size of the decompressor program is added to the size of the compressed files. Using too complex a decompressor program (e.g., a la style ''print "data...";'') will not generally be a good idea.

As a special feature, in some cases both the compressor and the decompressor programs have access to side information related to the file to be compressed. Using the side information is not obligatory, but it makes compression significantly easier, thus reducing the size of the compressed file.

Deliverables

Your documentation, due March 24, should describe the ideas/observations/principles underlying the compressor and the decompressor algorithms. A complete project is a tar-file (all links should be relative and all files in the directory returned) including

- high-level description (in English) of the algorithms,
- program source for both compressor and decompressor programs.

The documentation can be delivered as a separate tar-file that can be extracted in the same directory as the programs.

In order to make the sizes of the programs comparable, programming language is Java for all groups (sorry about that). The programs must compile in Linux with the Makefile given in the project template. The usage of the programs must conform with the test script also found in the template. In particular, input is read from the standard input stream and a side information file specified on the command line (e.g., ''java mydecompress sideinfo.sdat <compressed.data >decompressed.data''). See the README file.

The source you return must be the real source of the program. This means the source you work with when modifying the program yourself. That real source must compile without extra steps to other people. You may modify Makefile if you wish.

The form of the report is a WEB-page (the HTML template is included in the project template).

If you run into problems, email questions to Tomi.

Evaluation

The performance of your algorithm will be evaluated with the given set of data files. The loss-function is given by:

Loss := LD + LE_1 + ... + LE_N,




where N is the number of data files, LD is the size of the decompressor program, and LE_i is the size of the compressed version of the ith file.

The competition is iterative, meaning that there are several deadlines. After each deadline your success is evaluated and scores are published. Project points will be based only on the final round scores though.

Frequently Asked Questions

1. Can the decompression program use other input than stdin and the side information file given on the command line?

No.

2. Can I use Java standard library java.___.___?

Yes.

3. Do I have use to Java standard library java.___.___? My class file gets too big.

No.

4. Making the class files small is a question of good/bad/ugly Java programming skills, not compression!

Unfortunately this is more or less the case. However, we expect the differences due to compression rate to be more significant than the differences due to programming style. If this turns out to be a problem, we will increase the sizes of the data files or think of something else. Besides, think of the bright side of things: We first thought of comparing the sizes of the source files! (See IOCCC.)

5. Can the compressor and decompressor programs create and access temporary files in /tmp or elsewhere?

Yes.

6. Do the programs have to be general purpose applications, or does it suffice to be able to handle the given files?

Just the given files, with the given script (runtest.sh).

 

 3 Concepts: Information
2002