{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Exercise 1: Global network models (DL 5.11.)\n",
"============================================\n",
"\n",
"Your name:\n",
"\n",
"Name of your pair:\n",
"\n",
"1. An ER graph and its degree distribution (2 points)\n",
"-----------------------------------------------------\n",
"\n",
"In this exercise we will eventually create a random ER graph and plot its degree distribution.\n",
"However, we will start by introducing some basic python concepts and the iPython notebook\n",
"environment to get you started.\n",
"\n",
"a) The code below shows how to generate a random number in python. Run the code.\n",
"Modify the code to print 10 random integers.\n",
"\n",
"**Your solution:**"
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [
"from random import random\n",
"\n",
"print(random())"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"b) Graphs can be represented in python using dictionaries and lists. Below is an example of python\n",
"code representing the following directed graph:\n",
"\n",
"A $\\rightarrow$ B\n",
"\n",
"A $\\rightarrow$ C\n",
"\n",
"B $\\rightarrow$ E\n",
"\n",
"C $\\rightarrow$ B\n",
"\n",
"C $\\rightarrow$ D\n",
"\n",
"D $\\rightarrow$ A"
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [
"graph = {'A' : ['B', 'C'],\n",
" 'B' : ['E'],\n",
" 'C' : ['B', 'D'],\n",
" 'D' : ['A'],\n",
" 'E' : []}\n",
"print(graph)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a graph with 1000 vertices labeled {1,...,1000} where each directed pair of vertices is\n",
"connected by an edge with probability 0.05. Use the random function from above and the graph \n",
"data structure described above.\n",
"\n",
"**Your solution:**"
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"c) The code below produces 100 random integers in range [0,9] and draws a histogram of them."
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [
"from random import randint\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# Make plots appear inline\n",
"%matplotlib inline\n",
"\n",
"# Generate the random numbers\n",
"numbers = [randint(0,9) for i in range(0,100)]\n",
"# Plot the histogram\n",
"plt.hist(numbers, bins=10)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plot the degree histogram of the random graph you created in the previous step.\n",
"\n",
"**Your solution:**"
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. Degree distribution of *E. coli* regulatory network (3 points)\n",
"-----------------------------------------------------------------\n",
"\n",
"In this exercise we will read the *E. coli* regulatory network and plot its degree distribution.\n",
"\n",
"a) The *E. coli* regulatory network can be downloaded from RegulonDB:\n",
"\n",
"http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_gene.txt\n",
"\n",
"The header of the file contains general information on several commented lines starting with #.\n",
"This is followed by the description of the network where each lines describes one edge of the \n",
"network: the first column contains the transcription factor (TF) that regulates the gene in the\n",
"second column thus describing an edge pointing from the TF to the gene. Note that the TF names \n",
"start with upper case letters and the gene names with lower case letters. For example accB is \n",
"the gene coding for the TF AccB. Thus when constructing the regulatory network treat accB and \n",
"AccB as the same vertex. For this exercise you can ignore the rest of the columns.\n",
"\n",
"Write code that reads the *E. coli* regulatory network into a graph data structure similar\n",
"to that used in the first exercise. You can use the gene names as vertex labels. You might find\n",
"the python string method split helpful in parsing the file.\n",
"\n",
"**Your solution:**"
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"b) Plot a histogram of the degree distribution of the *E. coli* regulatory network.\n",
"\n",
"**Your solution:**"
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [
"import urllib.request # library handling URLs\n",
"\n",
"ecoli_network = {}\n",
"\n",
"# E. coli network file\n",
"networkfile=\"http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_gene.txt\"\n",
"# Open the file. This will work exactly like a regular file.\n",
"data = urllib.request.urlopen(networkfile)\n",
"for line in data:\n",
" line = line.decode('utf-8') # decode the byte array into a string\n",
" # Add code to load the data here"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"c) Plot the degree distribution also as a scatter plot where x axis is the degree and y axis gives \n",
"the number of vertices having the given degree. Use logarithmic scale on both axis. The code below\n",
"shows an example of drawing a scatter plot (of random data) and manipulating the axis of a plot. \n",
"\n",
"Does the degree distribution of the *E. coli* regulatory network follow a power law? Why? / Why not?\n",
"Try to fit a power law distribution to the data by plotting a curve \n",
"$y=a*x^{-k}$ with suitable values of $a$ and $k$."
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [
"from random import randint\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# Make plots appear inline\n",
"%matplotlib inline\n",
"\n",
"# Generate random data\n",
"x = [2**i for i in range(0,10)]\n",
"y = [2**randint(0,9) for i in range(0,10)]\n",
"# Get the current axis\n",
"axis = plt.gca()\n",
"# Set log scale on x axis\n",
"axis.set_xscale('log')\n",
"# Set log scale on y axis\n",
"axis.set_yscale('log')\n",
"# Plot the data ('o' makes it a scatter plot)\n",
"axis.plot(x,y,'o')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Your solution**:"
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Write your answer to the questions here.*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3. Statistical testing of a network property I (3 points)\n",
"-------------------------------------------------------\n",
"\n",
"In this exercise we will investigate the clustering coefficient of the *E. coli* regulatory\n",
"network and compare it to the clustering coefficient of ER graphs with the same number of vertices\n",
"and with a similar probability for an edge as in the *E. coli* regulatory network.\n",
"\n",
"a) Define a function for computing the clustering coefficient of a graph.\n",
"For a directed graph allowing self loops the clustering coefficient of a vertex $v$ is defined as\n",
"$$\n",
"\\frac{E_v}{k_v\\cdot k_v}\n",
"$$\n",
"where $E_v$ is the number of edges between the outneighbors of $v$ and $k_v$ is the outdegree of $v$.\n",
"The global clustering coefficient is then the average of the clustering coefficients\n",
"of all vertices. Set the clustering coefficient of vertices of degree 0 to 0 in your analysis.\n",
"What is the clustering coefficient of the *E. coli* regulatory network?\n",
"\n",
"**Your solution:**"
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [
"def clustering_coefficient(graph):\n",
" # Compute the clustering coefficient\n",
" \n",
" # Return the clustering coefficient\n",
" return 0.0\n",
"\n",
"print(clustering_coefficient(ecoli_network))"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"b) What is the number of vertices and edges of the *E. coli* regulatory network? For an ER graph\n",
"with the same number of vertices and edges, what would be the probability of having an edge\n",
"between two vertices?\n",
"\n",
"**Your solution:**"
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"c) Write a function that creates an ER graph with the given number of vertices and probability\n",
"for an edge to exist. Create an ensemble of 100 ER graphs with the same amount of vertices as \n",
"the *E. coli* regulatory network using the probability computed above for the existence of an edge.\n",
"Compute the clustering coefficient of each network and plot these as a histogram. As compared to the\n",
"ER graphs, is the *E. coli* regulatory network highly clustered? Why? / Why not?\n",
"\n",
"**Your solution:**"
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [
"def ergraph(n, p):\n",
" # Create an ER graph with n vertices and probability p for an edge to exist\n",
" graph = {}\n",
" \n",
" return graph\n",
"\n",
"# Create an ensemble of ER graphs and compute their clustering coefficients\n",
"\n",
"# Plot the histogram of the clustering coefficients"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Write your answer to the questions here.*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"4. Statistical testing of a network property II (2 points)\n",
"----------------------------------------------------------\n",
"\n",
"ER graphs have a very different degree distribution as compared to the *E. coli* regulatory \n",
"network. Therefore the result of the previous exercise might just be a consequence of having a \n",
"particular degree distribution. In this exercise we will investigate if the clustering coefficient\n",
"of the *E. coli* regulatory network is exceptional for graphs with the same degree distribution.\n",
"\n",
"a) First we will create random networks with the same degree distribution as the \n",
"*E. coli* regulatory network. One way to do this is to start with the *E. coli* regulatory network and\n",
"repeatedly choose two edges $a\\rightarrow b$ and $c\\rightarrow d$ and switch them by removing the original edges and inserting\n",
"the edges $a\\rightarrow d$ and $c\\rightarrow b$. Note that the degrees of the vertices stay \n",
"the same. Your first task is to write a function that takes as input a graph, creates a copy of\n",
"it and performs $m$ random switches on it where $m$ is the number of edges in the graph. Create\n",
"one randomly switched graph starting with the *E. coli* regulatory network using your new function and plot its degree distribution to make sure\n",
"it looks the same as for the original graph.\n",
"\n",
"**Your solution:**"
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [
"def switchgraph(graph):\n",
" # Create a copy of the original graph\n",
" switchedgraph = {}\n",
" \n",
" # Perform m random switches\n",
"\n",
" # Return the graph\n",
" return switchedgraph\n",
"\n",
"# Create a random graph and plot its degree distribution"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"b) Create 100 random graphs with the function you wrote and compute the clustering coefficient\n",
"of each graph. Plot a histogram of the clustering coefficients. Is the clustering coefficient of\n",
"the *E. coli* regulatory network exceptional as compared to this ensemble of graphs? Why? / Why not?\n",
"\n",
"**Your solution:**"
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Write your answer to the questions here.*"
]
}
],
"metadata": {}
}
]
}