Lecturer: Antti Honkela
Your name:
Name of your pair:
In this exercise you will experiment with methods for inferring regulatory interactions.
If you have any questions, please post them to the Moodle discussion area!
In this task, you will generate a sparse Erdős–Rényi random graph, form the corresponding Gaussian Markov random field and study its covariance.
Generate an undirected ER random graph for 12 genes using p=0.4. Construct a corresponding Gaussian Markov random field by forming the precision matrix which has non-zero elements corresponding to the edges of the graph. Make sure the precision matrix is positive definite by making the diagonal elements positive and sufficiently large compared to the other elements of the matrix.
%matplotlib inline
# Import some useful modules
import matplotlib.pyplot as plt
import numpy as np
import numpy.random as npr
Compute the covariance matrix among the 12 genes from the precision matrix. Visualise the matrix.
# Your code here
Can you spot the true connections?
Answer:
Try setting a threshold for absolute values of the covariance matrix elements to distinguish between correct and false connections. Use the Hamming distance between the true and estimated connectivity matrices as a measure of accuracy.
# Your code here
Answer:
Hints:
np.linalg.eigvals()
. This returns the eigenvalues of the matrix which should all be positive. Eigenvalues of a symmetric matrix are always real. If you get complex eigenvalues, make sure the matrix is symmetric. (Precision and covariance matrices of Gaussian distributions are always symmetric.)plt.spy()
function can be useful for exploring the sparsity pattern of matrices.In this task, you will simulate observations from the Gaussian Markov random field model constructed in Task 1 and attempt to infer the network structure using those observations.
Simulate three multivariate normal observation data sets of sizes [25, 50, 100] with 0 mean using the covariance from Task 1.
# Your code here
Compute and visualise the correlation matrix among the 12 genes simulated. How small Hamming distance can you obtain with the different data sets using a suitable threshold?
# Your code here
Compute and visualise the inverse of the correlation matrix. How small Hamming distance can you now obtain with the different data sets using a suitable threshold?
# Your code here
Learn the structure using graphical lasso (sklearn.covariance.graph_lasso()
). Try different values of the regularisation parameter. How good results in terms of the Hamming distance can you now obtain?
# Your code here
How good results did you obtain with the different approaches?
Answer:
pip3 install sklearn --user
" on the command line. If this does not work, more instructions are at: http://scikit-learn.org/In this task, you will simulate observations using a linear generative model: $$ \mathbf{y_i} = \mathbf{A} \mathbf{x_i} + \epsilon_i. $$ Here $\mathbf{x_i}$ are TF expression levels (assumed to be random, following a Gaussian distribution), $\mathbf{A}$ is the connectivity matrix from TFs to target genes, $\mathbf{y_i}$ are the target gene expression levels and $\epsilon_i$ is Gaussian observation noise. Based on these data, you will attempt to infer the network structure.
Generate a sparse Erdős–Rényi random connectivity graph from 3 TFs to 9 target genes and sample the random connection weights having this sparsity pattern.
# Your code here
Simulate 3 data sets following the above connectivity graph and model with 3 TFs and 9 target genes, having [25, 50, 100] samples. You can simulate $\mathbf{x_i}$ from $\mathcal{N}(0, 1)$ and $\epsilon_i$ from $\mathcal{N}(0, 0.1^2)$.
# Your code here
Form a data set by concatenating the TFs and targets. Compute and visualise the correlation matrix among the 12 genes. How small Hamming distance between the true network and the estimated network can you obtain?
# Your code here
Compute and visualise the inverse of the correlation matrix. How well can you now find the true connections in terms of the Hamming distance?
# Your code here
X
-> X.T
) if necessary.In this task, you will simulate observations using the following dynamical model: $$ \frac{d \mathbf{x}(t)}{dt} = \sigma(\mathbf{A} \mathbf{x}) - \delta \mathbf{x} + \gamma, $$ where $\mathbf{x}$ are the gene expression levels, $\mathbf{A}$ is the connectivity matrix (connecting only TFs to TFs and TFs to targets), $\sigma(x) = 1/(1+e^{-x})$ is a sigmoid function, $\delta$ are the mRNA degradation rates and $\gamma$ are constant external inputs (affecting only TFs).
Implement a function for simulating from the above system. You can use scipy.integrate.odeint()
to numerically solve the differential equation given a suitable non-negative initial state.
Pick a plausible connectivity matrix with 3 TFs and 9 target genes that includes different combinations of TFs regulating different targets and also some negative connection weights. Simulate the system with a constant non-negative external input and $\delta = 1$ and visualise the results.
# Your code here
Simulate the system with number of non-negative random external inputs until it reaches steady state. What can you see?
# Your code here
Compute the correlations and inverse covariance for the combined steady-state measurements from [10, 20, 40] simulations. Can you recover the true connections?
# Your code here
How well can you recover the true connections?
Your answer:
The use of sigmoid function in the dynamics as in this task is not very common. It has clear advantages over plain linear model (saturation of transcription rate to a fixed maximum, possibly easier stability guarantees, possibility to model repression), but the presented formulation still cannot realistically model combinatorial regulation, and it makes model inference much more difficult than linear differential equations.