# Exercise 2: Network motifs (DL 13.11.)

In this exercise we will use the networkx python library that gives us access to some network related algorithms and data structures.

## 1. Counting motifs in a network (2 points)

In this exercise set we will investigate two network motifs: Feed Forward Loop ($A\rightarrow B, B \rightarrow C, A \rightarrow C$) and 3-cycle ($A\rightarrow B, B \rightarrow C, C \rightarrow A$). The code below defines these as networkx graphs and shows two ways of creating a graph.

In []:
import networkx as nx

# Define FFL by giving a list of edges
ffl = nx.DiGraph([('A','B'), ('B','C'), ('A','C')])
# Define 3-cycle by creating an empty graph and adding vertices and edges one by one
cycle = nx.DiGraph()

# Print the edge lists
print(ffl.edges())
print(cycle.edges())


The code below defines a network G. Write a function that counts the occurrences of the motifs FFL and 3-cycle. Use the frequency concept $\cal{F}_1$. Use the function to count the occurrences of FFL and 3-cycle in G.

Hint: Enumerate all paths of length two, i.e. ($A\rightarrow B, B \rightarrow C$), and check for the existence of the third edge of the motif. The DiGraph in networkx has methods successors and has_edge that you might find useful. Pay attention to the symmetries in the 3-cycle motif and make sure you report each occurrence of the motif only once.

In []:
G = nx.DiGraph([(1,3),(2,1),(2,3),(3,4),(4,2),(4,5),(4,6),(5,6)])


## 2. Motifs in the E. coli regulatory network (2 points)

1. In the previous exercise set (Question 2) you wrote code that reads the E. coli regulatory network. Modify that code to read the network into a networkx graph.

In []:
import urllib.request  # library handling URLs

network = nx.DiGraph()

# E. coli network file
# Open the file. This will work exactly like a regular file.
data = urllib.request.urlopen(networkfile)
for line in data:
line = line.decode('utf-8')

print(network.edges())

1. Using the function from the previous exercise, count the occurrences of FFL and 3-cycle in the E. coli reulatory network.

In []:

## 3. Statistical significance of motifs (3 points)

1. The lecture slides give a formula for the expected value and variance of the number of motif occurrences in ER networks. Using the formula compute these values for the two motifs, FFL and 3-cycle. Compare the number of occurrences of these motifs in the E. coli regulatory network to the expected number of occurrences in ER graphs by calculating the Z-score. Is the number of occurrences of the motifs statistically significant?

In []:

1. The networkx library includes the function erdos_renyi_graph for generating ER graphs. Use it to generate an ensemble of 100 random directed graphs with $n$ vertices and edge probability equal to $m/n^2$ where $m$ is the number edges and $n$ the number of vertices in the E. coli regulatory network. For each graph in the ensemble compute the number of occurrences for the two motifs, FFL and 3-cycle. Calculate the average number of occurrences for the motifs and standard deviation of the number of occurrences. Do these coincide with the theoretical results? For each of the motifs plot the number of occurrences as a histogram.

Note: The networkx function does not add self loops to the ER graph. If you want to be perfect, you can add those yourself with the given probability. However, this will not affect the results of our analysis of the occurrence of these motifs because the motifs do not have self loops.

In []:

## 4. Enumerating motifs (3 points)

Let us consider directed graphs allowing self loops (an edge from vertex v to vertex v) but disallowing multiple edges (edges with the same direction between the same pair of vertices). In this exercise we will enumerate all possible motifs of a given size.

The two motifs with a single vertex are a graph containing only a single vertex and a graph containing a single vertex and a self loop on that vertex. Write code that starts with these two single vertex motifs and expands them first to all possible two vertex motifs and then to all possible three vertex motifs. Remember that a motif has to be connected and your collection of motifs should not contain motifs that are isomorphic to each other. How many motifs with two vertices exist? How about motifs with three vertices? Can you compute the the number of motifs with four vertices?

Hint: The networkx library has a function is_isomorphic which checks if two graphs are isomorphic to each other.

# The two motifs with one vertex