Project in Biodatabases

582685
2
Bioinformatiikka
Syventävät opinnot
Project on programmatic access to Ensembl gene annotation database. Prerequisites: basics of SQL, basics of Python, some bioinformatics studies (genome structure, alignments).
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2012 syksy 29.10-03.12. 2-2 Englanti Leena Salmela

Luennot

Aika Huone Luennoija Päivämäärä
Ma 12-14 B119 Leena Salmela 29.10.2012-03.12.2012

Registration for this course starts on Tuesday 9th of October at 9.00.

Yleistä

The course consists of a project completed as self study.

Download the assignments.

 

Kurssin suorittaminen

Schedule

  • Introductory session on Monday 29.10.2012 12-14
  • Guidance sessions on Mondays 12-14 in B230 (no preplanned program, you can come and ask questions)
  • Deadline for submitting the project Friday 30.11.2012

Course MySQL Server

  • host: users.cs.helsinki.fi
  • username: anonymous
  • no password
  • socket: /home/tkt_mbie/mysql/socket
  • database: homo_sapiens_core_59_37d
  • accepts only local connections, so client/scripts have to be run on this host
  • email the instructor if the server is not responding (it might die/get killed)
  • users.cs.helsinki.fi does not accept ssh connections originating from outside the university network so go through a server (e.g. melkinpaasi) if needed

Browsing the database with MySQL Client

Take an ssh connection to users.cs.helsinki.fi. The home directory you see here is separate from the CS home directory. Use scp to transfer files between users.cs.helsinki.fi and your CS home directory.

Start the MySQL client:

username@users:~$ mysql -u anonymous -S /home/tkt_mbie/mysql/socket homo_sapiens_core_59_37d

List database tables:

mysql> SHOW tables;

Take a look at the first 10 entries in the gene table:

mysql> SELECT * FROM gene LIMIT 10;

How many rows do we have in gene table:

mysql> SELECT COUNT(*) FROM gene;

That's a lot. Let's take a look at the distinct biotypes available:

mysql> SELECT DISTINCT biotype FROM gene;

Let's take a look at only tRNAs encoded in the mitochondrial genome (Mt_tRNA):

mysql> SELECT gene_id, biotype FROM gene WHERE biotype='Mt_tRNA';

Get the display names of the tRNAs:

mysql> SELECT display_label FROM gene JOIN xref ON display_xref_id=xref_id WHERE biotype='Mt_tRNA';

Quit the client:

mysql> exit;

Running the example python scripts

Study the SQL and Ensembl biodatabase lectures by Ilari Scheinin at http://www.cs.helsinki.fi/u/ischeini/eob/ . Copy the example python programs to users.cs.helsinki.fi and run them there.

Start working on the project

Download the assignments.

Submitting project

Submit your project at latest on Friday 30.11.2012 to Leena Salmela by email.