University of Helsinki Department of Computer Science
 

Department of Computer Science

Department information

 

58310111 Seminar: Graph Mining (3 cr), spring 2010

This page: http://www.cs.helsinki.fi/hannu.toivonen/teaching/seminarS10/

The Moodle page for this seminar: https://courses.cs.helsinki.fi/course/view.php?id=59

The final seminar meeting May 20-21

The final days of our seminar is organized as a workshop. This 2-day workshop will begin with a guest lecture given by Professor Ehud Gudes from Ben-Gurion University of the Negev Beer-Sheva, Israel. (The lecture will also start the course Graph Mining - Motivation, Algorithms and Applications.)

The full program is as follows:

Date:
 Thursday 20 May 2010 in room C222
time
title
speaker
report file
9:00-12:00
Guest Lecture
Pr. Ehud Gudes

12:00-13:00
Lunch break
13.00-14.00
Graph Compression
Fang Zhou
PDF
14:10-15:10
Machine learning for molecular classification
Hongyu Su
PDF

Date:
 Friday 21 May 2010 in room C220
time
title
speaker
report file
9.30-10.30
Modeling social groups
Mikhail Shubin
PDF
10:40-11:40
Small World Phenomenon
L.H.

11:40-12:30
Lunch break
12:30-13:30
Finding a diverse set of nodes in probabilistic graphsLaura Langohr
PDF



Place: Rooms C222 and C220, Department of Computer Science, Exactum Building. Gustaf Hällströmin katu 2b, Helsinki, Finland.

General Plan

The seminar will meet during spring term 2010 at the following times:

NB! The initial seminar meeting will be on Monday 18 January. All participants must attend.

Seminar subject matter

In data mining (knowledge discovery), we study and develop algorithms for the analysis of large bodies of data. This seminar will be based on research literature in looking at methods developed for the mining of networked data. A graph or network is an all-purpose, flexible way to present data, and very different from e.g. table representations from the viewpoint of mining methods. Some of the objects that can be analysed are social, interpersonal networks; networks of hypertext documents like the www; or biological networks (such as the interactions between proteins).

Prerequisites

Participants must have completed the course Scientific writing or have equivalent skills. You will have a significant advantage if you have completed the course Methods for data mining. Other useful courses include advanced courses in data mining, machine learning, data analysis, the Three concepts courses, and Design and analysis of algorithms.

A maximum of 12 students will be elected for the seminar on the basis of their progress so far and the suitability of the courses they have completed.

Completing the seminar

Students complete this seminar by actively participating in its work: the work methods include studying scientific sources, writing reports and giving presentations, reading the reports of other participants and evaluating them, and actively following presentations.

The grading will be based on each student's own written work (1/3), oral presentation (1/3), and commentary on the reports of others as well as activeness in general (1/3). To pass the seminar, each of these components must be passed.

Seminar routines

The seminar will copy scientific conferences. Each participant will write a report ('article') on their topic during early spring, which will be handed in to the organiser of the seminar ('conference'). The other seminar participants (now in the role of a 'programme committee') will evaluate the report and give their feedback on it, and the writer will then rework the report into its final version. Finally, the seminar ('conference') will have a two-day meeting and the participants will give their oral presentations. The final, polished versions of the reports will be compiled (as 'conference proceedings') before the 'conference' and made available to the participants.

Contrary to many other seminars, this one will only meet five times: for the initial coming-to-order meeting, for three intermediate two-hour sessions, and the final meeting that lasts two days. The (initial) schedule (deadlines and meetings in bold):

Guidelines

Keep in mind that the written report and the oral presentation have partially different goals.

The oral presentation should explain the main ideas of the content, simplifying concepts when necessary. Depending on the topic, a good presentation should include many examples to illustrate the subject matter and only some choice technical details that are important and can be discussed thoroughly enough during the presentation. The oral presentation should last around 45 minutes.

For the report, put more emphasis on exactness and scientific representation. The report will often be a summary of the source material, so you must pick and choose what to include. The things you have picked to discuss in your report must then be described in enough detail; for the things you leave out, refer to the source material. A suitable length for the report is 10-15 pages (please see the website for the course Scientific writing for instructions). The department's seminar instructions will also guide you.

Source literature

Below is a list of suitable source literature for this seminar. Each student selects a topic for his/her report and presentations, and uses suitable scientific source material (typically 1-2 journal papers as key sources). The prerequisites by the articles below vary a lot. Some of the articles are available in electronic format only from computers in the university network (and with suitable proxy settings in your browser or by a VPN connection). You may use other literature, too. In any case, you must agree on your source literature with the seminar leaders in good time before you start.

  1. J. Kleinberg: Complex Networks and Decentralized Search Algorithms. Proceedings of the International Congress of Mathematicians (ICM), 2006. PDF
  2. James Moody, Douglas R. White: Structural Cohesion and Embeddedness: A Hierarchical Concept of Social Groups. American Sociological Review 68 (1): pp. 103-127, 2003. PDF
  3. L Franke, H Bakel, L Fokkens, ED de Jong, M Egmont-Petersen, C Wijmenga: Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet, Vol. 78, No. 6. (June 2006), pp. 1011-1025. PDF
  4. X. Yan, M. Mehan, Y. Huang, M. S. Waterman, P. S. Yu, and X. Zhou: A Graph-Based Approach to Systematically Reconstruct Human Transcriptional Regulatory Modules. ISMB'07, the 15th Annual Int. Conf. on Intelligent Systems for Molecular Biology, Jul. 2007. PDF
  5. C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu: Mining Behavior Graphs for `Backtrace' of Noncrashing Bugs. SDM'05,Proc. of 2005 SIAM Int. Conf. on Data Mining, 2005. PDF
  6. Christoph Helma, Tobias Cramer, Stefan Kramer, Luc De Raedt: Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds. Journal of Chemical Information and Modeling 44 (4): pp. 1402-1411, 2004. PDF
  7. Björn H. Junker: Networks in Biology. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 1-14, 2008 (preview) PDF
  8. Falk Schreiber: Graph Theory. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 15-28, 2008 (preview)
  9. Ralf Steuer, Gorka Zamora López: Global Network Properties. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 29-63, 2008 (preview)
  10. Dirk Koschützki: Network Centralities. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 65-84, 2008 (preview)
  11. Henning Schwöbbermeyer: Network Motifs. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 85-111, 2008 (preview)
  12. Balabhaskar Balasundaram, Sergiy Butenko: Network Clustering. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 113-138, 2008 (preview)
  13. Ina Koch, Monika Heiner: Petri Nets. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 139-179, 2008 (preview)
  14. Anatolij P. Potapov: Signal Transduction and Gene Regulation Networks. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 181-206, 2008 (preview)
  15. Frederik Börnke: Protein Interaction Networks. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 207-232, 2008 (preview)
  16. Márcio Rosa da Silva, Jibin Sun, Hongwu Ma, Feng He, An-Ping Zeng: Metabolic Networks. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 233-253, 2008 (preview)
  17. Birgit Gemeinholzer: Phylogenetic Networks. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 255-281, 2008 (preview)
  18. Ursula Gaedke: Ecological Networks. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 283-304, 2008 (preview)
  19. Dirk Steinhauser, Leonard Krall, Carsten Müssig, Dirk Büssis, Björn Usadel: Correlation Networks. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 305-333, 2008 (preview)
  20. Hiroto Saigo, Nicole Krämer, Koji Tsuda: Partial least squares regression for graph mining. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 578-586, 2008. PDF
  21. Graph-based transfer learning: He, J., Liu, Y., and Lawrence, R., In Proceeding of the 18th ACM Conference on information and Knowledge Management, CIKM '09. ACM, pp. 937-946, 2009. PDF
  22. Graph summarization with bounded error : by Navlakha, S., Rastogi, R., Shrivastava, N. In: SIGMOD'08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, ACM pp. 419-432, 2008. PDF
  23. Graph Summaries for Subgraph Frequency Estimation, by Angela Maduko, Kemafor Anyanwu, Amit P. Sheth, Paul Schliekelman, in The Semantic Web: Research and Applications, LNCS vol. 5021, Springer, pp 508-523, 2008. PDF

Student feedback

At the end of the seminar, the final participants were asked to fill in an anonymous on-line questionnaire about the seminar.

Learning objectives and subjective learning experience
The objectives that students themselves reported mostly concerned improving their scientific writing skills. Presentation skills were also mentioned, as well as learning about graph mining in general or about a specific topic. All students said they had met their learning objectives. Due to the small number of participants, the view on graph mining was not broad, however.

Seminar routines and work hours
The standard components of seminars, that is the actual writing of the report (in two phases), preparation of the oral presentation, and the final meeting, were expectedly considered most important elements (mean scores between 4.75 and 5 out of 5).

Receiving feedback from other students (in face-to-face feedback discussions) was important (4), evaluating reports of others not quite as much (3.5). Experiences from getting and giving feedback for slides were neutral (3).

The early phases of selecting source material, preparing report plan and a two-slide presentation (2.75) and having the five-minute presentations of the topics (2.5) were considered relatively unimportant. (However, from the point of view of seminar chairs these seem to be important for making sure all students get a good start with their work.)

The average amount of work put into the seminar was about 70 hours, but the variance was large. The average corresponds well to the theoretical amount of work for a 3 cr seminar.

Other feedback

Leaders of the seminar

Professor Hannu Toivonen, University of Helsinki

Dr. Sébastien Mahler