Biological Sequence Analysis

582483
5
Bioinformatics
Advanced studies
The course covers selected high-throughput methods for the analysis of biological sequences, including advanced alignment methods, Hidden Markov Models, and next-generation sequencing data analysis methods. Prerequisities: Basics of bioinformatics and algorithms.

Exam

28.02.2013 09.00 B123
Year Semester Date Period Language In charge
2013 spring 14.01-21.02. 3-3 English Veli Mäkinen

Lectures

Time Room Lecturer Date
Mon 12-14 C222 Veli Mäkinen 14.01.2013-21.02.2013
Thu 10-12 C222 Veli Mäkinen 14.01.2013-21.02.2013

Exercise groups

Group: 1
Time Room Instructor Date Observe
Mon 10-12 C222 Veli Mäkinen 14.01.2013—22.02.2013

General

Content

  • Mon 14.1 10-12: Background check, we'll solve some simple exercises online to check prerequisites.
  • Mon 14.1 12 -14: Lecture 1, Alignments revisited + shortest detour, sections 3.1, 3.3, 3.4.1, and 3.4.2.
  • Thu 17.1 10-12: Lecture 2, Invariant techniques, sparse dynamic programming, sections 3.2, 3.4.4, and 3.4.5.
  • Mon 21.1 10-12: Exercise 1: [PDF], [solutions]
  • Mon 21.1 12-14: Lecture 3, Markov chains, score schemes, log-odds, BLOSUM, GC-content, GC-skew, CAI [slides.pptx] [slides.pdf]
  • Thu 24.1 10-12: Lecture 4, Hidden Markov Models, forward, backward, sections 4-4.7
  • Mon 28.1 10-12: Exercise 2 [PDF]
  • Mon 28.1 12-14: Lecture 5, Baum-Welch, More complex HMMs, pair HMMs, sections 4.8-4.10.2
  • Thu 31.1 10-12: Lecture 6, Sampling, profile HMMs, sections 4.10.3-4.11.2
  • Mon 4.2 10-12: Exercise 3 [PDF]
  • Mon 4.2 12-14: Lecture 7, Advanced pseudocounts, Jumping alignments, Multiple alignment, Carol-Lippman msa, sections 4.11.2, 4.12, 3.6
  • Thu 7.2 10-12: Lecture 8,  High-throughput sequencing (HTS) overview, variant calling, Burrows-Wheeler transform, FM-index, search space pruning [slides.pptx] [slides.pdf] + sections 7.1 and 7.3.1
  • Mon 11.2 10-12: Exercise 4 [PDF]
  • Mon 11.2 12-14: Lecture9, RNA-sequencing, splice variants, transcriptomics, co-linear chaining, sections 7.1.4, 3.5, 7.3.3, 7.3.4
  • Thu 14.2 10-12: Lecture 10, ChIP-seq, PSSMs, motif discovery, projection method [slides.pptx] [slides.pdf]+ section 7.3.2
  • Mon 18.2 10-12: Exercise 5 [PDF] [solutions]
  • Mon 18.2 12-14: Lecture 11, Whole genome alignment, MEMs and MUMs  on suffix tree and on FM-index variants [slides.pptx] [slides.pdf] + sections 7.2-7.2.2 (just definitions, not the two-way BWT algorithms),  7.3.5 + suffix tree algorithms on blackboard for maximal repeats / exact matches / unique matches (see Gusfield's book) + dot plots, k-mer counting, FASTA, BLAST ideas.
  • Thu 21.2 10-12: Lecture 12, Algorithms for fragment assembly, section 7.2.3
  • Thu 28.2 9:00-, B123 EXAM; exam is now graded, see the 2nd floor announcement board.
  • Feedback session on Thursday 14.3 at 13.00 in room A239b. See my own thoughts about the course.
  • Check renewal / separate exam dates here: http://www.cs.helsinki.fi/exams

Completing the course

Course consists of lectures and exercises. Exam gives maximum 40 points, weekly exercises 20 points.  Grading is based on total points gathered.  However, to pass the course, one must pass the exam with at least 20 points.

Literature and material

This course script covers most of the material and there will be couple of powerpoint slide sets in addition. Most lectures will be given on black board.

  • Script updated 21.1 (floor to ceil in shortest detour, a -1 added and initialization at 0,0 corrected at to affine gap cost computation formulae)
  • Script updated 24.1 (definition of P(c) added to generation probability at HMMs, condition "|" converted into joint prob. "," in matching probability as that is what is actually computed)
  • Script updated 1.2 (in pair HMM sampling, one v (viterbi) replaced by f (forward), to count the prob. of a transition correctly)
  • Script updated 4.2 (multiple alignment dynamic programming formula corrected)
  • Script updated 14.2 (ChIP-seq peak detection HMM emission probability corrected to binomial distribution... now it sums to 1)