Exercises
Monday
Exercise 1
Explain briefly the steps in the information flow in cells.
Exercise 2
What are exons and introns? What happens in splicing?
Exercise 3
What are the steps in the cDNA microarray analysis? (Omitting the steps in production of arrays.)
Exercise 4
What are the differences between analysis procedures comparing cDNA microarrays and Affymetrix oligonucleotide arrays?
Tuesday
For questions 1-3 answers should be brief but to the point.
Exercise 1
Spot segmentation and quantification are important since these are the steps which provide the required numeric values for each gene for further analysis. Briefly explain:
- Whether only intensity, or only spatial information or a combination of both should be use and why.
- Which overall measure (e.g. total, mean, median, etc) from a spot would you use and why.
- Briefly explain why and how to correct for background intensity.
Exercise 2
Normalisation: describe the different types of normalization of microarray data, when are they applied and what are their strengths and weaknesses.
Exercise 3
- Compare and contrast two types of DNA microarrays. State the advantages and disadvantages of each.
- Give some examples of situations that may give rise to the missing values in DNA microarray data. What can be done to deal with the missing values?
Exercise 4
Download data file containing the background-corrected spot intensities for the two channels (i.e. dyes) for four repeats of each gene in a experiment.
- If you are asked to summarise this data for each gene and each channel what would you take as a representative value and why?
- Does this data show dye-bias?
- Does this data show intensity dependent dye-bias?
- If there is dye-bias how would you correct for dye bias for this data? How does the normalized data appear?
- Identify few genes that are differentially expressed between the two samples. Which method would you use to identify them and why? If your method involves multiple statistical comparisons do you think it is essential to correct them and if so then how would you propose to correct them?
Wednesday
Make simplifying assumptions if you cannot solve the exercises in their most general form.Exercise 1
Explain what the Bias-Variance tradeoff discussed in the "introduction to modeling"-lecture in the morning means, and derive the formula. (Motivation: Get familiar with computing with expectations etc., and with the notation.)
Exercise 2
(a) Construct maximum likelihood classification rule for two classes that are normally distributed with the same covariance matrix. Describe properties of the classifier. Is it familiar? What would you do if the covariance and mean of the classes were not known?
(b) Apply the kernel trick to the classifier (Hint: start by assuming that the covariance matrix is the identity matrix). Is the resulting classifier good (why/why not)?
Exercise 3
Construct a data set of two equally big normally distributed clusters, with means (0,0) and (0,1) in a two-dimensional space. The standard deviation of the clusters equals 1 and the total size of the data set 100.
(a) Apply at least 2 clustering algorithms to the data, and report the results. Use at least 2 distance measures (by scaling the data if your program package does not allow several measures).
(b) Use external and internal validation. Are the clustering results good?
(c) Apply the same algorithms on some other data set that illustrates differences between them. Construct the data yourself or use some public domain data.
(d) Criticize your solution and this excercise.
Alternative to 3 if you do not have access to a suitable program package: Program K-means and apply it to a data set (Hint: avoid all unnecessary details; the basic algorithm is simple.)
Thursday
Exercise 1
Explain how MAP kinase signalling pathway works.
Exercise 1
How does the RNAi pathway work? What are the consequences for genes?
Exercise 3
Give a real example of gene regulatory network (preferably other than those covered in the course). Where did you learn about this network? What type of data would you require to be able to reconstruct this network? What model would you propose for this network and why?
Exercise 4
Boolean network is an example in which we can observe the systems phenomena through a simple model. Create an example of a system with at least 5 different attractors (point and/cyclic). Construct the model so that at least one of the basins will contain multiple trajectories. Revise/extend this model to a probabilistic Boolean network. Demonstrate if possible an intervention in at least one of the two models.
Friday
Exercise 1
What is the principle of polymerase chain reaction (PCR)?
Exercise 2
What are tissue microarrays (TMA)?
Exercise 3
Assume that your task is to predict cancer classes. Since you have learned that kernel methods can easily incorporate various data sources, you decide to use lots of them.
(a) You realize that for a single gene the squared difference (x-y)^2 is a good kernel, and combine the data sources as described in the lectures, with fixed weights for the kernels. Is this a good idea? Why/why not?
(b) You want to utilize all possible knowledge of the patients, including their shoe size, and include it as yet another data set. Is this a good idea? Why/why not?

