1) a) and b) are clear from lectures. For c) the common part are the seeds (k-mer versus approximate k-mers). For d) again see lectures (helps to go over repeats, or helps in scaffolding). For e) either (heuristic) progressive alignment or the speed-up that tries to fill only the matrix partly (see lectures). Grading: Each part gave max 3 points depending on the level of accuracy of the answer. 2) This was quite clear, except the running time: max of O(maxnumberoftranscriptsforagene^2 * maxlengthoftranscript^2) and the time for searching transcript names in the large cDNA file. The latter was not required in the answer. Grading: 12 point for correct description, 3 points for correct running time. 3) a) Either the DP taking into account exon/intron structure, or exon chaining (see lectures) 7.5 points Some points given from other approaches like learning statistics of exons from already annotated genomes etc. Some points given from mentioning approximate string matching (although this is not sufficient here because of introns). b) Correct sketch of HMM visualization 4.5 points. Correct explanation what viterbi is doing 3 points. 4 a) Correct definitions 4.5+4.5 points. Each correct connection with explanation(!) 2 points (6 points max, although one can see more than 3 connection). b) Let A[1,m] be normal sequence and B[1,n] SOLiD colour-code sequence. Let M[a,k]=b denote the coding from nucleotide a and colour k to nucleotide b. Let us fill a matrix: S[i,j]= max_{i'