582615: Overlay and P2P Networks, Spring 2013

Spoiler Package I

This spoiler package contains some answers for Exercise Package I.

Assignment 1, Spoiler

a.

Some reasons for using overlay networks include different needs regarding privacy, robustness, routing topologies, identifier spaces, mobility or scaling. Solving parts of the problem in an overlay abstracts networking problems so one does not need to deal with them in the application. Overlay networks can also be used to experiment with different Internet designs. Overlays also isolate complex designs from each other, so a bug in one overlay should not in many cases affect the operation of other overlays.

b.

There are practical, political and technological reasons why the Internet can not be rapidly changed. Updates need to be carried out incrementally while the system is in operation. Lots of effort has been invested into creating infrastructure that optimise efficiency of existing protocols. Some of the infrastructure would need to be replaced to upgrade the Internet. As Internet is federated, all of the participants would need to agree on the correct upgrades to make before they could be made. Finally supporting multiple different kinds of operation is complex, and agreeing on a simple set of base functionality is easier to carry out in general.

The upgrade from IPv4 to IPv6 can be seen as an example of an attempt to upgrade the Internet. In 2011 IANA ran out of IPv4 addresses [1]. Still full IPv6 transformation has yet to happen. Tunneling technologies such as Teredo are used to transfer IPv6 traffic over is tunneled over the IPv4 network [2]. Thus, it seems we end up building overlay networks even when we originally do not intend to.

There are on-going attempts to lower the barrier for modifying networking equipment. The research field is called "software defined networking" (SDN). Reprogramming network equipment makes it possible to run custom protocols efficiently. SDN also makes it possible to reuse the networking equipment for different purposes. How far this can be taken, remains to be seen.

[1] http://www.iana.org/assignments/ipv4-address-space/ipv4-address-space.xml
[2] http://www.ietf.org/rfc/rfc4380.txt

c.

There are several reasons why native networks can perform better than overlays can. The overlay user may not be able to take advantage of all the features provided by the underlay if the overlay abstraction hides those features. For example a wifi network supports broadcasting messages to everyone over radio, but an overlay might hide the functionality, providing only support for unicast communication between the nodes.

The overlay may also add extra information, such as alternative addressing to the nodes. The addresses need to be stored in message headers, and might not be necessary, as the underlay might already provide a functional addressing scheme.

The overlay might not understand the topology of the underlay. Thus the same traffic might pass multiple times over the same link in the underlay before it arrives to its destination. The communication from your computer to a friend sitting next to you on a couch might travel to a foreign country and back, as there might not be a connection between your computers in the overlay despite that share the same local area network.

Using an overlay may have a negative effect on security and privacy as using an overlay may increase the amount of computers that can access your data while it is in transit, and may increase your amount of open connections as well.

Overlays may be costly to users as they move the requirement for processing power and network bandwidth from the data center towards the end hosts. This may decrease the battery life of the end user mahines.

Finally, design problems in overlay protocols, may cause them to amplify user actions. Malicious users can harness broken overlay systems to perform Denial of Service attacks.

Using overlays also adds general complexity. For example networking problems may become harder to debug when parts of the interactions happen on another level.

Assignment 2, Spoiler

a.

The following stack is presented in the lecture slides.

routing
topology
network

The network layer is responsible for providing some means for communication. The topology layer takes care of parting and joining nodes and builds them into a network of some shape. The routing layer is responsible for figuring out how requests and responses should be forwarded so they would reach their intended destination.

While the course book presents something like the following.

application
services
resource management
security
routing
network

The network and routing layer are similar to the model from lecture slides. Security above routing is used to secure content that is being delivered. Security could also be below routing to secure the delivery pipes. Or both the content and the pipes could be secured, also all layers contribute in one way or another to the secure operation of a system. The resource management layer allocates resources such as network bandwidth or storage space. Finally the services layer reveals different functionalities of the overlay to the application that runs on application layer.

b.

The TCP/IP stack (I) is as follows.

I_application
I_transport
I_internet
I_link

For the sake of simplicity I will only go through the exercise with one overlay stack (O). I'll pick the smaller one, but I'll add an application layer on top.

O_application
O_routing
O_topology
O_network

One way to think about this is that the overlay stack is the application for the TCP/IP stack (I_application = O_network + O_topology + O_routing + O_application). In this case we replace the application layer of the TCP/IP stack with the overlay stack. Here the O_network layer is implemented on top of the I_transport layer.

O_application
O_routing
O_topology
O_network
I_transport
I_internet
I_link

Another way to think about it is to use the three lower layers of the TCP/IP directly as an implementation of the O_network layer (O_network = I_link + I_internet + I_transport).

O_application
O_routing
O_topology
I_transport
I_internet
I_link

A third model can be derived if we consider the stacks to be open ended. That is, there can be layers above O_application and I_application, and there can be layers below O_network and I_link. This means that we can implement O_network as an application for the TCP/IP stack (I_application = O_network). Once we put the other overlay layers on top of that stack it looks like this again.

O_application
O_routing
O_topology
O_network
I_transport
I_internet
I_link

While looking at software stack models it may be useful to forget about the boxes (implementations) and concentrate on the lines (interfaces) instead.

c.

The routing layer of the overlay network borrows ideas from forwarding service on the internet layer. Both have a forwarding service, some addressing, and some way of computing routes hop-by-hop without global knowledge of the network. The topology layer defines the topology of an overlay network, where as the link layer contains the topology restrictions in the TCP/IP stack. The network layer of the overlay network defines the hard limitations that can not be overcome on the layers above. In TCP/IP stack link layer defines similar limitations.

The Internet has many properties similar to overlay networks. For example it is an abstraction run on top of other networks. The reasoning behind Internet's design is also similar to the reasoning behind overlay networks. The existing physical networks were hard to change. It was more feasible to combine them with a shared virtual addressing scheme. The term overlay network, however is typically used for networks that can be conveniently changed without requesting permission from providers of the network – a property that the Internet does not have.

Assignment 3, Spoiler

a.

The info_hash works as a password for a certain torrent. It can be used to request information about other peers from the tracker. It can also be used to request pieces from other peers. It works as a unique identifier and can be used to limiting the results to a specific torrent when requesting torrent availability information using the scrape protocol. Some implementations use the info_hash in key exchange to negotiate keys for encrypted communications [1].

The info_hash can also be used to verify the integrity of the info structure which is included in the torrent file. The info structure contains hash values for the payload, which makes info_hash a self-certifying name for the shared content. This means that given the shared content and the info structure anyone is able to verify whether or not the content matches a given info_hash.

While the info_hash can be computed from the info structure in a torrent file, it would typically not be stored in the torrent file in a precomputed form. Leaving the precomputed form out forces the user of the torrent file to compute the hash instead of merely trusting a precomputed value. The content can alternatively be refered to using magnet links which include the info_hash, but does not include the info structure. In such case the info structure can be retrieved based on the info_hash [2]. The retrieved info structure can then be verified to match the info_hash.

[1] http://wiki.vuze.com/w/Message_Stream_Encryption
[2] http://bittorrent.org/beps/bep_0009.html

b.

By including some fields in the info structure, the bittorrent developers are saying that those details separate different torrents from each other. Modifying those fields of the torrent file will change the torrents identity, separating the torrent swarm from the swarm of the original torrent swarm.

One key design decision is that changing the filenames splits the torrent in two. An alternative approach would have excluded the filenames, enabling two seeders with different file names, but the same content to co-operate. On the otherhand including the filenames in the structure helps to ensure that everyone gets the exact same file structure. Having multiple swarms for the same content also improves the systems resilience against attacks, as a malicious user would have to attack each swarm separately.

c.

By excluding some fields from the info structure, the bittorrent developers are enabling those fields to be edited without affecting the identity of the torrent. Some examples of such metadata includes trackers used for coordination and textual description of the content. It is thus possible to add and remove trackers or to do incremental improvements to the description without introducing disruptions to the service.

Assignment 4, Spoiler

a.

Peers who contribute to efficient operation of the network are rewarded with better service. This is done through choking the connections to the peers that do not contribute and unchoking connections to the peers that provide better service.

Part of the bandwidth is allocated for serving new peers who have yet to prove their worthiness. Worthiness is decided based on the capacity to serve rare pieces. Everyone gets some service. Good behaviour is rewarded, but the amount of traded resources do not match.

b.

Most of the time peers strive to prove their worthiness to choke algorithm run by other nodes. Thus the node needs to get access to some rare pieces to improve its position in the game. In the beginning the node is more desperate and is willing to accept any pieces. It needs some pieces so it can start proving its worthiness to its peers.

The end game mode is used at the end of the download when the peer is already likely to have lots of rare pieces. Thus it already has the means for obtaining better service from its neighbours. Also, when the node is nearly finished it does not need to gain lots of extra capacity for downloading the few pieces it is still missing. So long term investment of retrieving valuable pieces first might not pay of from the nodes perspective, as there is not enough download time left to make use of the better service gained through contributing.

By making rare pieces more valuable to its neighbour each peer works towards a common goal of distributing the content more evenly. This has the benefit of making it less likely that the content would disappear from the network. Better distribution also helps to balance load among the participating nodes.

c.

Before the new seed state modification (nssm) the choke algorithm was vulnerable to an attack where some high bandwidth free riders would prevent everyone else from getting content. The free riders would not need to be malicious, they might simply be free riders because of restrictive network environment.

Since the seeders do not need to request any pieces, they have no way of telling contributing peers from the free riders. Seeders implementing the nssm divide their upload time evenly among all interested peers. Thus the distribution can no longer be limited to the high bandwidth free riders. As the leechers are able to get some pieces, the free riders can only limit the amount of time it takes to push the content out to the well behaving community, preventing the attack.

Obviously nssm improves performance in cases where leechers are able to get some data instead of getting no data at all. Pushing the content out to a larger amount of nodes also helps the performance in cases where some nodes disappear from the network. When the content is more evenly distributed it does not matter if one of the nodes leaves the network, as there are plenty of other nodes ready to step in its place.

When the seed communicates with a higher number of nodes, the knowledge about rare pieces will become more accurate, which helps to push out the pieces that are rare in the network as a whole. In the old model the seed might concentrate on pieces that are rare on the fewer peers it has made the decision to serve.

As the nssm seeds deliver pieces to large amounts of nodes they may help new peers get their first piece faster. Getting first pieces faster could improving the rate at which the nodes can enter regular operation.

Above we have discussed some aspects of the nssm that are expected to improve the performance. To find out how the nssm affects the performance, we would additionally need to take into account its negative effects. Determining how a given modification affects the performance of a system requires doing empirical experiments. Doing such experiments are out of the scope of this assignment.