582615: Overlay and P2P Networks, Spring 2013
This spoiler package contains some answers for Exercise Package I.
Some reasons for using overlay networks include different needs regarding privacy, robustness,
routing topologies, identifier spaces, mobility or scaling. Solving parts of the problem in
an overlay abstracts networking problems so one does not need to deal with them in the
application. Overlay networks can also be used to experiment with different Internet designs.
Overlays also isolate complex designs from each other, so a bug in one overlay should not in
many cases affect the operation of other overlays.
There are practical, political and technological reasons why the Internet can not be
rapidly changed. Updates need to be carried out incrementally while the system is in
operation. Lots of effort has been invested into creating infrastructure that optimise
efficiency of existing protocols. Some of the infrastructure would need to be replaced
to upgrade the Internet. As Internet is federated, all of the participants would need
to agree on the correct upgrades to make before they could be made. Finally supporting
multiple different kinds of operation is complex, and agreeing on a simple set of
base functionality is easier to carry out in general.
The upgrade from IPv4 to IPv6 can be seen as an example of an attempt to upgrade the
Internet. In 2011 IANA ran out of IPv4 addresses [1]. Still full IPv6 transformation
has yet to happen. Tunneling technologies such as Teredo are used to transfer IPv6
traffic over is tunneled over the IPv4 network [2]. Thus, it seems we end up
building overlay networks even when we originally do not intend to.
There are on-going attempts to lower the barrier for modifying networking equipment.
The research field is called "software defined networking" (SDN).
Reprogramming network equipment makes it possible to run custom protocols efficiently.
SDN also makes it possible to reuse the networking equipment for different purposes.
How far this can be taken, remains to be seen.
[1] http://www.iana.org/assignments/ipv4-address-space/ipv4-address-space.xml
There are several reasons why native networks can perform better than overlays can.
The overlay user may not be able to take advantage of all the features provided by
the underlay if the overlay abstraction hides those features. For example a wifi
network supports broadcasting messages to everyone over radio, but an overlay
might hide the functionality, providing only support for unicast communication
between the nodes.
The overlay may also add extra information, such as alternative addressing to the
nodes. The addresses need to be stored in message headers, and might not be
necessary, as the underlay might already provide a functional addressing scheme.
The overlay might not understand the topology of the underlay. Thus the same
traffic might pass multiple times over the same link in the underlay before
it arrives to its destination. The communication from your computer to a friend
sitting next to you on a couch might travel to a foreign country and back,
as there might not be a connection between your computers in the overlay
despite that share the same local area network.
Using an overlay may have a negative effect on security and privacy as using
an overlay may increase the amount of computers that can access your data
while it is in transit, and may increase your amount of open connections
as well.
Overlays may be costly to users as they move the requirement for processing
power and network bandwidth from the data center towards the end hosts.
This may decrease the battery life of the end user mahines.
Finally, design problems in overlay protocols, may cause them to amplify
user actions. Malicious users can harness broken overlay systems to perform
Denial of Service attacks.
Using overlays also adds general complexity. For example networking
problems may become harder to debug when parts of the interactions
happen on another level.
The following stack is presented in the lecture slides. The network layer is responsible for providing some means for communication. The topology layer takes care of parting and joining nodes and builds them into a network of some shape. The routing layer is responsible for figuring out how requests and responses should be forwarded so they would reach their intended destination. While the course book presents something like the following.
The network and routing layer are similar to the model from lecture slides. Security above routing is used to secure content that is being delivered. Security could also be below routing to secure the delivery pipes. Or both the content and the pipes could be secured, also all layers contribute in one way or another to the secure operation of a system. The resource management layer allocates resources such as network bandwidth or storage space. Finally the services layer reveals different functionalities of the overlay to the application that runs on application layer.
The TCP/IP stack (I) is as follows. For the sake of simplicity I will only go through the exercise with one overlay stack (O). I'll pick the smaller one, but I'll add an application layer on top. One way to think about this is that the overlay stack is the application for the TCP/IP stack (I_application = O_network + O_topology + O_routing + O_application). In this case we replace the application layer of the TCP/IP stack with the overlay stack. Here the O_network layer is implemented on top of the I_transport layer. Another way to think about it is to use the three lower layers of the TCP/IP directly as an implementation of the O_network layer (O_network = I_link + I_internet + I_transport). A third model can be derived if we consider the stacks to be open ended. That is, there can be layers above O_application and I_application, and there can be layers below O_network and I_link. This means that we can implement O_network as an application for the TCP/IP stack (I_application = O_network). Once we put the other overlay layers on top of that stack it looks like this again.
While looking at software stack models it may be useful to forget about the
boxes (implementations) and concentrate on the lines (interfaces) instead.
The routing layer of the overlay network borrows ideas from forwarding service on the internet layer. Both have a forwarding service, some addressing, and some way of computing routes hop-by-hop without global knowledge of the network. The topology layer defines the topology of an overlay network, where as the link layer contains the topology restrictions in the TCP/IP stack. The network layer of the overlay network defines the hard limitations that can not be overcome on the layers above. In TCP/IP stack link layer defines similar limitations.
The Internet has many properties similar to overlay networks. For example it is an abstraction run on top of other networks. The reasoning behind Internet's design is also similar to the reasoning behind overlay networks. The existing physical networks were hard to change. It was more feasible to combine them with a shared virtual addressing scheme. The term overlay network, however is typically used for networks that can be conveniently changed without requesting permission from providers of the network – a property that the Internet does not have.
The info_hash works as a password for a certain torrent. It can be used to request information about other
peers from the tracker. It can also be used to request pieces from other peers. It works as a unique
identifier and can be used to limiting the results to a specific torrent when requesting torrent
availability information using the scrape protocol. Some implementations use the info_hash in key
exchange to negotiate keys for encrypted communications [1].
The info_hash can also be used to verify the integrity of the info structure which is included in
the torrent file. The info structure contains
hash values for the payload, which makes info_hash a self-certifying name for the shared content. This
means that given the shared content and the info structure anyone is able to verify whether or not the
content matches a given info_hash.
While the info_hash can be computed from the info structure in a torrent file, it would typically
not be stored in the torrent file in a precomputed form. Leaving the precomputed form out forces
the user of the torrent file to compute the hash instead of merely trusting a precomputed value.
The content can alternatively be refered to using magnet links which include the info_hash, but
does not include the info structure. In such case the info structure can be retrieved based on
the info_hash [2]. The retrieved info structure can then be verified to match the info_hash.
[1] http://wiki.vuze.com/w/Message_Stream_Encryption
By including some fields in the info structure, the bittorrent developers are saying that those details
separate different torrents from each other. Modifying those fields of the torrent file will change
the torrents identity, separating the torrent swarm from the swarm of the original torrent swarm.
One key design decision is that changing the filenames splits the torrent in two. An alternative
approach would have excluded the filenames, enabling two seeders with different file names, but
the same content to co-operate. On the otherhand including the filenames in the structure helps
to ensure that everyone gets the exact same file structure. Having multiple swarms for the same
content also improves the systems resilience against attacks, as a malicious user would have to
attack each swarm separately.
By excluding some fields from the info structure, the bittorrent developers are enabling those
fields to be edited without affecting the identity of the torrent. Some examples of such
metadata includes trackers used for coordination and textual description of the content.
It is thus possible to add and remove trackers or to do incremental improvements to the
description without introducing disruptions to the service.
Peers who contribute to efficient operation of the network are rewarded with better service. This is done through choking the connections to the peers that do not contribute and unchoking connections to the peers that provide better service.
Part of the bandwidth is allocated for serving new peers who have yet to prove their worthiness. Worthiness is decided based on the capacity to serve rare pieces. Everyone gets some service. Good behaviour is rewarded, but the amount of traded resources do not match.
Most of the time peers strive to prove their worthiness to choke algorithm run by other nodes. Thus the node needs to get access to some rare pieces to improve its position in the game. In the beginning the node is more desperate and is willing to accept any pieces. It needs some pieces so it can start proving its worthiness to its peers.
The end game mode is used at the end of the download when the peer is already likely to
have lots of rare pieces. Thus it already has the means for obtaining better service
from its neighbours.
Also, when the node is nearly finished it does not need to gain lots of extra capacity for
downloading the few pieces it is still missing. So long term investment of retrieving
valuable pieces first might not pay of from the nodes perspective, as there is not
enough download time left to make use of the better service gained through contributing.
By making rare pieces more valuable to its neighbour each peer works towards a common goal of
distributing the content more evenly. This has the benefit of making it less likely that the
content would disappear from the network. Better distribution also helps to balance load
among the participating nodes.
Before the new seed state modification (nssm) the choke algorithm was vulnerable to an attack
where some high bandwidth free riders would prevent everyone else from getting content.
The free riders would not need to be malicious, they might simply be free riders
because of restrictive network environment.
Since the seeders do not need to request any pieces, they have no way of telling
contributing peers from the free riders. Seeders implementing the nssm divide their
upload time evenly among all interested peers. Thus the distribution can no longer
be limited to the high bandwidth free riders. As the leechers are able to get some
pieces, the free riders can only limit the amount of time it takes to push the
content out to the well behaving community, preventing the attack.
Obviously nssm improves performance in cases where leechers are able to get some data
instead of getting no data at all. Pushing the content out to a larger amount of nodes
also helps the performance in cases where some nodes disappear from the network.
When the content is more evenly distributed it does not matter if one of the nodes
leaves the network, as there are plenty of other nodes ready to step in its place.
When the seed communicates with a higher number of nodes, the knowledge about rare
pieces will become more accurate, which helps to push out the pieces that are rare
in the network as a whole. In the old model the seed might concentrate on
pieces that are rare on the fewer peers it has made the decision to serve.
As the nssm seeds deliver pieces to large amounts of nodes they may help new peers
get their first piece faster. Getting first pieces faster could improving the
rate at which the nodes can enter regular operation.
Above we have discussed some aspects of the nssm that are expected to
improve the performance. To find out how the nssm affects the
performance, we would additionally need to take into account its negative effects.
Determining how a given modification affects the performance of a system
requires doing empirical experiments. Doing such experiments are out of the
scope of this assignment.
Assignment 1, Spoiler
a.
b.
[2] http://www.ietf.org/rfc/rfc4380.txt
c.
Assignment 2, Spoiler
a.
routing
topology
network
application
services
resource management
security
routing
network
b.
I_application
I_transport
I_internet
I_link
O_application
O_routing
O_topology
O_network
O_application
O_routing
O_topology
O_network
I_transport
I_internet
I_link
O_application
O_routing
O_topology
I_transport
I_internet
I_link
O_application
O_routing
O_topology
O_network
I_transport
I_internet
I_link
c.
Assignment 3, Spoiler
a.
[2] http://bittorrent.org/beps/bep_0009.html
b.
c.
Assignment 4, Spoiler
a.
b.
c.