Sampling Massive Online Graphs: Challenges, Techniques, and Applications to Facebook

Event type: 
Guest lecture
Event time: 
16.11.2011 - 13:15 - 14:00
Lecturer : 
Maciej Kurant
Place: 
Exactum B222
Description: 

Sampling Massive Online Graphs: Challenges, Techniques, and Applications to Facebook

Maciej Kurant
University of California, Irvine

Abstract

Online Social Networks (OSNs) have recently emerged as a new killer application, and are of interest to a number of communities, ranging from computer science and engineering to social sciences. Because of their sheer size (Facebook alone has more than 800 million active users), OSNs are widely studied today based on *samples* collected through measurements of publicly available information. In this talk, I will give an overview of our recent work on sampling OSNs.

First, I will discuss how to efficiently crawl a friendship graph to collect a representative sample of its *nodes*. To this end, I will introduce two novel techniques: (i) "Multigraph Sampling" that exploits different relations between nodes, and (ii) "Stratified Weighted Random Walk" that preferentially crawls nodes more relevant to our measurement. I will evaluate these techniques on real-life OSNs, such as Facebook and LastFM. This will also allow me to study some basic characteristics of these OSNs, e.g., the privacy awareness in Facebook.

Second, I will focus on *topology* rather than on nodes alone. Breadth First Search (BFS) is a natural and widely used sampling technique in this context. Unfortunately, BFS is subject to nontrivial and often very significant biases, which I quantify analytically. As a viable alternative, I propose to study a "coarse-grained" version of the underlying topology, and I show how to estimate it based on a sample of nodes. Finally, I will apply this methodology to Facebook to obtain a global country-to-country friendship network (more examples available at geosocialmap.com).

This work is joint with Athina Markopoulou, Minas Gjoka, and Carter Butts at the University of California, Irvine and with Patrick Thiran at EPFL, Lausanne. Parts of this work appear in IEEE INFOCOM 2010, ITC 2010, ACM SIGMETRICS 2011 and IEEE JSAC 2011.

Bio

Maciej Kurant (mkurant.com) received his Ph.D. degree in Communication Systems at EPFL, Switzerland, in 2009. Currently, he is a postdoc at University of California, Irvine. His main areas of research interest include sampling Online Social Networks, survivability in optical and overlay networks, and inference and study of large-scale complex networks (such as the human brain). Maciej is also the founder and developer of AcronymCreator.net - a tool that helps you generate an acronym-like name for your new algorithm, system, product, etc.

23.09.2011 - 09:50 Hannu Toivonen
23.09.2011 - 09:48 Hannu Toivonen