Main Page

Data Sets








There are several data-sets included in the challenge. Some of them are real transcribed medieval manuscripts, some artificial (but hand-copied) ones. The data-sets are made available in (at least) two phases. Correct answers to some of the first phase data-sets will be announced before the final submission deadline in order to enable self-assessment by the participants. These data-sets will not affect the final results.

Submissions are made in groups (of size one or more persons). Each group should submit at most two solutions. If more than two solutions are submitted, the last two before the deadline are accepted. A person can belong to at most two groups. (So a person can contribute to a maximum of four solutions.)


For the artificial data-sets, for which there is a known 'ground-truth' solution, the difference between the proposed and correct solutions is evaluation. The exact criterion is open for discussion among participants (and other interested parties). The current scoring function is based on distance orders among triplets (for details, see Example)  Go to Discussion.

For the real data-sets, there is no known correct solution. Therefore, EBS¹ will be used: the 'owner' of the data-set will be asked to estimate whether the proposed solution is plausible and useful from his/her point of view.

¹ EBS: endorphin-based scoring

Ranking Schemes

There are two ranking schemes.

  • Primary ranking is based on performance on the primary data-set (see Data Sets).
  • Secondary ranking is based on all the other data sets except those for which the correct solution is annouced during the challenge, including both artificial and real data. For these other data sets, 'thumbs-up' marks will be awarded to the best submissions, and the total number of thumbs up determines the secondary ranking.

The organizers reserve the right to alter the rules of the challenge at any point.


Anyone who has been in contact with some of the data-sets, or has otherwise obtained inside information about the correct solution should obviously not enter the challenge as regards the data-set(s) in question. However, participation is allowed as regards the other data-sets.

The data-sets are provided only for the use in the challenge. Further use of the data requires an explicit permission by the organizers and the original providers of the data.