Exercise session 1

Introduction to bioinformatics, Autumn 2009

Group 1: Thursday 17.9 12-14 Exactum BK106
Group 2: Tuesday 17.9 16-18 Exactum BK106.

General instructions:

Problems for each exercise session will be distributed approximately one week before the session. You are expected to be prepared to present your solutions in the exercise session.

In addition, you need to send notes of the assignments you are going to mark to Laura Langohr by email before exercises (Thursdays 12.15).

These exercise notes should contain a brief description of the steps you took to solve the assignment, as well as the results. Important: When sending email, use subject of form "ITB exercise X, where X is the exercise session number (1/2). Send your notes in email text body. If you need to include a figure, send it as an attachment.


  1. In PubMed, search for review articles published in the last year discussing the gene HbA1 in humans.

  2. Search for gene HbA1 in OMIM.

  3. Search for HbA1 in NCBI RefSeq using Entrez. Hint: Choose Nucleotide option from the Search list and set the options in Limits tab accordingly.

    Access the entry for human HBA1 in NCBI RefSeq and answer the following questions.

  4. Find entries related to gene HbA in UniProt.

  5. Download genome sequences of Escherichia coli (GenBank ID NC_000913) and Thermoplasma volcanium (NC_002689) from NCBI.

    1. Find out, using your favourite programming language (notes on programming languages below) or other method, the nucleotide, dinucleotide and trinucleotide frequencies.
    2. What is the G-C content of the sequences?
    3. Draw a diagram of 2-word and 3-word distributions in both sequences (you can use any software available).

  6. Write a program in your favourite language that tries to find gene coding regions with the following method.

    Test your program with this DNA sequence.

Programming languages

You can solve the programming assignments (problems 5 and 6) with any suitable language. However, I suggest using a relatively high-level language such as Python, Perl, R, Matlab or Octave because of ease of implementation. Python and Perl are scripting languages that are probably the easiest to learn if you are new to programming. Good tutorials are provided for both. Furthermore, the languages mentioned above are available on CS computers (try typing "python", "perl", "R", "matlab" or "octave" in CS shell).