Practical course in biodatabases, exam 24.2.2009

    1. What is the difference between EMBL and EnsEMBL databases? (1.25 p)
    2. What are the differences between Genbank and RefSeq databases? (1.25 p)
    3. When it is advisable to search data from RefSeq as opposed to Genbank? (1.25 p)
    4. If you had a DNA sequence for the human gene XRCC4, how would you go about searching for annotations for this gene? And how would you get the protein sequence coded by the gene? (1.25 p)

    1. A database contains information on all known human genetic disorders, covers over 12,000 genes and focuses on the relationship between phenotype and genotype.
      1. On the basis of this database, what is the number of genes with known sequence and phenotype in chromosome 22? (1.25 p)
      2. One of the genes is MIAT. How many SNPs are known from this gene? (1.25p)
    2. Perform a BLAST search by using the sequence of the fifth exon of the MIAT gene. From what animal(s) the corresponding sequence information is/are known (almost) completely? (2.5 p)

  1. Give the SQL queries that answer the following questions. (5p, 1p each)

    1. What is the total number of transcripts?
    2. How many of these transcripts have a corresponding protein translation?
    3. What is the stable ID and length of the longest transcript?
    4. There is a SNP at position 534 066 (bp) in chromosome 17. What is the stable_id of the gene covering this location?
    5. How many genes have (at least one) protein translation?

  2. Give the SQL queries that answer the following questions. (5p, 1p each)

    1. How many markers are there in the marker map "Genethon"?
    2. What marker map has the biggest number of markers?
    3. What is the average length for a marker?
    4. What is the marker_id for a marker with the biggest number of synonyms?
    5. What is the display name of this marker?

  3. Give the SQL queries that answer the following questions. (5p, 1p each)

    1. What is the total number of cytogenetic bands (stored in the karyotype table)?
    2. What is the number of cytogenetic bands in each chromosome?
    3. What is the average length of cytogenetic bands?
    4. What is the starting position (in base pairs) of the cytogenetic band 17q21.31?
    5. A gene starts at position 31 787 617 (bp) of chromosome 13. What is the cytogenetic band covering this location?

Instructions

Return your answers as an email. Put 'PCBD <yourfirstname> <yourlastname>' as subject and send the email to astikain@cs.helsinki.fi. Write your student number or identity number to the begin of the email. You can type your answers to the email body or to an attachment file if it is in .txt, .doc or .pdf format. You are not allowed to discuss about the questions during the exam but you can use all material which can be found from the internet.

For the SQL queries use homo_sapiens_core_52_36n database at db.cs.helsinki.fi server or at ensembl's server. You do not need to solve each exercise in one step, but can use multiple queries. You are also allowed to utilize results from previous questions when possible.