Seminar on High-throughput Sequencing Data Analysis

Syventävät opinnot
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2016 kevät 18.01-02.05. 3-4 Englanti Alexandru Tomescu


Aika Huone Luennoija Päivämäärä
Ma 14-16 B119 Alexandru Tomescu 18.01.2016-29.02.2016
Ma 14-16 C220 Alexandru Tomescu 14.03.2016-02.05.2016



  • 18 Jan 2016: The list of proposed topics is available below. You are also free to choose your own topic
  • 19 Jan 2016: Instructions for the final report have been added (See Structure of the final report below)
  • 19 Jan 2016: Topics have been assigned


High-throughput sequencing (HTS), developed over the last ten years, can produce cheaply and in large quantities shorts fragments of the DNA or RNA. It has revolutionized some key analyses in computational biology, and it has stimulated the development of new data structures and algorithms for dealing with this data. HTS it currently migrating from research applications to hospitals, with the hope of providing better diagnostic (for e.g., cancer) and treatment. 

In this seminar you will get to know about some recent research on analysing HTS data. Each student will choose a pair of two papers: one describing an algorithmic development related to HTS data, and one describing a real biological application where this is useful.

However, the main goal of this seminar is to develop your wtitten and oral scientific communication skills, and the seminar oraganization will reflect that (see Learning Objectives). You must hold three presentations on your topic (5 minutes, 15 minutes, and 30 minutes), and hand in three versions of your final report. The length of the final report depends on the topic, but you can aim for 10-12 pages if possible. You will be asked to comment on one draft of your coleagues' reports (to write a scientific review) via an online conference management system.

The seminar spans weeks 3-8 and 11-18. Below is a draft of the seminar structure (this may vary depending on the number of participants)

Week  Activity
  • Lecturer explains the organization and briefly introduces the paper topics.
  • Students vote for their topic preferences via an online system, and will be assigned a pair of papers.
  • Each student hands in a 1-page summary of the assigned two papers.
  • Each student holds a 5-minute (3 slides) oral presentation about the topic. 
10 Semester break
  • Students submit a draft of the report. The other students will be requested to anonymously review the draft and give comments for improving it.
  • Students hold a 10-minute (6 slides) oral presentation about the topic. The other students (and lecturer) give written improvement comments after each presentation.
13 Easter holidays (24.3.2016-30.3.2016)


  • Students submit the final version of the report, taking into account the reviews (including the lecturer's comments). You also have to submit an additional document explaining how you addressed the comments, or a rebuttal of each comment that you didn't address.
  • Students hold a final 25-minute (10-15 slides) oral presentation. The other students (and lecturer) give written improvement comments after each presentation.


Structure of the final report

You final report should include the following information extracted from the two papers that you are assigned. Do not copy/paste, but explain things with your own words (some copying is accepted for technical definitions, but be sure to appropriately reference it). 3/4 of the report could be devoted to the algorithmic paper, and 1/4 to the biological paper.

From the algorithmic paper:

  • Practical biological motivation
  • Algorithmic state-of-the-art methods
  • Biological input and output
  • Algorithmic problem formulation (input, output)
  • Algorithmic solution for the problem formulation
  • How was the approach tested experimentally: simulated data (how)?, real data?, what was the performance metric used?
  • Strong points of the paper and its presentation, according to you (game changing approach? potential for other applications?)
  • Weak points of the paper and its presentation, according to you (missing details? unconvincing experiments? in-efficient algorithms?)
  • Ways in which the method can be improved, according to you

From the biological paper:

  • What is the biological problem
  • In which points are computational tools used, and how
  • How does the algorithmic paper related to the biological one (if not, what are the differences)
  • Strong or weak points

Guidelines for writing the review

Follow the instructions from this webpage: and address Sections 1-4 of "What goes into a review". Also keep in mind the above points from "Structure of the final report". Be fair and constructive! The review can be approx. 1-2 pages long (depending on your page margins).

Kurssin suorittaminen

It is compulsory to submit all three versions of your report

  • 1-page summary (week 5)
  • report draft (week 11)
  • final report (week 16)

It is compulsory to present all three versions orally.


The grade is made up of:

  • 40% = final report (the quality of the final report, and how well you addressed, or rebutted, the reviewers' comments)
  • 40% = three oral presentations
  • 15% = quality of your reviews of the other students' drafts (weeks 11-12)
  • 5% = attending most of the presentation days (this requirement will be adjusted depending on the number of participants)

Kirjallisuus ja materiaali

Proposed topics:

Topics have been assigned. 

1. Genome assembly: [Available]

  • Algorithmic paper: P. Medvedev et al., A Novel Approach for Incorporating Mate Pair Information into Genome Assemblers, Journal of Computational Biology, 18(11): 1625-1634 (2011)
  • Optional algorithmic paper: A. Bankevich et al., SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing,  Journal of Computational Biology 19(5): 455-477 (2012)
  • Biological paper: H. Banani et al., Genome sequencing and secondary metabolism of the postharvest pathogen Penicillium griseofulvum, BMC Genomics (2016) 17:19


2. Transcriptome assembly: [Assigned]

  • Algorithmic paper: M Pertea et al., StringTie enables improved reconstruction of a transcriptome from RNA-seq reads, Nature Biotechnology 33(3): 290-295 (2015)
  • Biological paper: H. Tilgner et al., Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events, Nature Biotechnology 33(7): 736-742 (2015)


3. Virus assembly: [Assigned]

  • Algorithmic paper: A. Topfer et al., Viral Quasispecies Assembly via Maximal Clique Enumeration, PLOS Computational Biology 10(3): e1003515 (2014)
  • Biological paper: C. Schlotterer et al., Sequencing pools of individuals — mining genome-wide polymorphism data without big funding, Nature Reviews Genetics 15, 749-763 (2014)


4. Pangenomes: [Assigned]

  • Algorithmic paper: R. Rahn et al., Journaled string tree—a scalable data structure for analyzing thousands of similar genomes on your laptop, Bioinformatics 30(24): 3499-3505 (2014)
  • Biological paper: The 1000 Genomes Project Consortium, An integrated map of genetic variation from 1,092 human genomes, Nature 491, 56-65 (2014)


5. Finding genetic variations: [Assigned]

  • Algorithmic paper: T. Marschall et al., MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels, Bioinformatics 29(24): 3143-3150 (2013)
  • Biological paper: The Genome of the Netherlands Consortium, Whole-genome sequence variation, population structure and demographic history of the Dutch population, Nature Genetics 46(8), 818-825 (2014)


6. Reconstructing cancer genomes: [Assigned]

  • Algorithmic paper: L. Oesper et al., Reconstructing cancer genomes from paired-end sequencing data, BMC Bioinformatics 13(Suppl 6):S10 (2012)
  • Biological paper: C. D. Greenman et al., Estimation of rearrangement phylogeny for cancer genomes, Genome Research 22:346-361 (2012)


7. Variations from multiple cancer samples: [Assigned]

  • Algo-bio paper: F. Hormozdiari et al., Simultaneous structural variation discovery among multiple paired-end sequenced genomes, Genome Research 21:2203-2212 (2011)
  • Algo-bio paper: V. Popic et al., Fast and scalable inference of multi-sample cancer lineages, Genome Biology 16:91 (2015)