58310111 Seminar: Graph Mining (3 cr), spring 2010
The Moodle page for this seminar: https://courses.cs.helsinki.fi/course/view.php?id=59
The final seminar meeting May 20-21
The final days of our seminar is organized as a workshop. This 2-day workshop will begin with a guest lecture given by Professor Ehud Gudes from Ben-Gurion University of the Negev Beer-Sheva, Israel. (The lecture will also start the course Graph Mining - Motivation, Algorithms and Applications.)
The full program is as follows:
|Date:|| Thursday 20 May 2010 in room C222|
|9:00-12:00||Guest Lecture||Pr. Ehud Gudes|
|14:10-15:10||Machine learning for molecular classification
|Date:|| Friday 21 May 2010 in room C220|
|9.30-10.30||Modeling social groups||Mikhail Shubin||PDF|
|10:40-11:40||Small World Phenomenon||L.H.|
|12:30-13:30||Finding a diverse set of nodes in probabilistic graphs||Laura Langohr||PDF|
Place: Rooms C222 and C220, Department of Computer Science, Exactum Building. Gustaf Hällströmin katu 2b, Helsinki, Finland.
General PlanThe seminar will meet during spring term 2010 at the following times:
- Mon 18 Jan, 12-14, C220
- Mon 8 Feb, 12-14, C220
- Mon 22 Mar, 12-14, C220
- Mon 12 Apr, 12-14, C220
- Thu 20 May, 9-16, C222 (NB: full day)
- Fri 21 May, 9-16, C220 (NB: full day)
NB! The initial seminar meeting will be on Monday 18 January. All participants must attend.
Seminar subject matter
In data mining (knowledge discovery), we study and develop algorithms for the analysis of large bodies of data. This seminar will be based on research literature in looking at methods developed for the mining of networked data. A graph or network is an all-purpose, flexible way to present data, and very different from e.g. table representations from the viewpoint of mining methods. Some of the objects that can be analysed are social, interpersonal networks; networks of hypertext documents like the www; or biological networks (such as the interactions between proteins).
Participants must have completed the course Scientific writing or have equivalent skills. You will have a significant advantage if you have completed the course Methods for data mining. Other useful courses include advanced courses in data mining, machine learning, data analysis, the Three concepts courses, and Design and analysis of algorithms.
A maximum of 12 students will be elected for the seminar on the basis of their progress so far and the suitability of the courses they have completed.
Completing the seminar
Students complete this seminar by actively participating in its work: the work methods include studying scientific sources, writing reports and giving presentations, reading the reports of other participants and evaluating them, and actively following presentations.
The grading will be based on each student's own written work (1/3), oral presentation (1/3), and commentary on the reports of others as well as activeness in general (1/3). To pass the seminar, each of these components must be passed.
The seminar will copy scientific conferences. Each participant will write a report ('article') on their topic during early spring, which will be handed in to the organiser of the seminar ('conference'). The other seminar participants (now in the role of a 'programme committee') will evaluate the report and give their feedback on it, and the writer will then rework the report into its final version. Finally, the seminar ('conference') will have a two-day meeting and the participants will give their oral presentations. The final, polished versions of the reports will be compiled (as 'conference proceedings') before the 'conference' and made available to the participants.
Contrary to many other seminars, this one will only meet five times: for the initial coming-to-order meeting, for three intermediate two-hour sessions, and the final meeting that lasts two days. The (initial) schedule (deadlines and meetings in bold):
- Mon 18.1., 12-14: coming to order
- 19.1.-29.1.: selecting source material and reading it, preparation of the outline of the report (1-2 pages)
- Fri 29.1.: deadline for the report plan (delivered in Moodle; before delivery in Moodle. send your proposal by email to Sebastien and Hannu for approval)
- 1.-5.2.: identifying the key contents of the future seminar report, preparing 2 slides about it
- Mon 8.2., 12-14: brief presentations (5 min) of report topics; initial distribution of review tasks
- 9.2.-19.3.: writing the report (model manuscript: pdf, tex) an updating the two-slide presentation
- Fri 19.3.: deadline for handing in reports (Moodle)
- Mon 22.3., 12-14: brief presentations (5 min) of reports, finalisation of review allocations
- 23.3.-9.4.: evaluation of the reports of two other participants (review_form.txt)
- Fri 9.4.: deadline for report evaluations (Moodle)
- Mon 12.4., 12-14: feedback discussions on reports and evaluations
- 13.-27.4.: finalising the report on the basis of the evaluations
- Wed 28.4.: deadline for final report and a short description of changes made to the report (both in Moodle)
- 29.4.-5.5.: preparing the oral presentation
- Wed 5.5.: deadline for handing in slides for the oral presentation (Moodle)
- 6.-12.5.: commenting on slides of two other participants
- Wed 12.5.: deadline for comments on slides (Moodle)
- 14.-19.5.: reading the reports of other participants
- 13.5.-19.5.: finalising the slides on the basis of the feedback
- Thu 20.5., 9-16: seminar presentations I/II
- Fri 21.5., 9-16: seminar presentations II/II
Keep in mind that the written report and the oral presentation have partially different goals.
The oral presentation should explain the main ideas of the content, simplifying concepts when necessary. Depending on the topic, a good presentation should include many examples to illustrate the subject matter and only some choice technical details that are important and can be discussed thoroughly enough during the presentation. The oral presentation should last around 45 minutes.
For the report, put more emphasis on exactness and scientific representation. The report will often be a summary of the source material, so you must pick and choose what to include. The things you have picked to discuss in your report must then be described in enough detail; for the things you leave out, refer to the source material. A suitable length for the report is 10-15 pages (please see the website for the course Scientific writing for instructions). The department's seminar instructions will also guide you.
Below is a list of suitable source literature for this seminar. Each student selects a topic for his/her report and presentations, and uses suitable scientific source material (typically 1-2 journal papers as key sources). The prerequisites by the articles below vary a lot. Some of the articles are available in electronic format only from computers in the university network (and with suitable proxy settings in your browser or by a VPN connection). You may use other literature, too. In any case, you must agree on your source literature with the seminar leaders in good time before you start.
- J. Kleinberg: Complex Networks and Decentralized Search Algorithms. Proceedings of the International Congress of Mathematicians (ICM), 2006. PDF
- James Moody, Douglas R. White: Structural Cohesion and Embeddedness: A Hierarchical Concept of Social Groups. American Sociological Review 68 (1): pp. 103-127, 2003. PDF
- L Franke, H Bakel, L Fokkens, ED de Jong, M Egmont-Petersen, C Wijmenga: Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet, Vol. 78, No. 6. (June 2006), pp. 1011-1025. PDF
- X. Yan, M. Mehan, Y. Huang, M. S. Waterman, P. S. Yu, and X. Zhou: A Graph-Based Approach to Systematically Reconstruct Human Transcriptional Regulatory Modules. ISMB'07, the 15th Annual Int. Conf. on Intelligent Systems for Molecular Biology, Jul. 2007. PDF
- C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu: Mining Behavior Graphs for `Backtrace' of Noncrashing Bugs. SDM'05,Proc. of 2005 SIAM Int. Conf. on Data Mining, 2005. PDF
- Christoph Helma, Tobias Cramer, Stefan Kramer, Luc De Raedt: Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds. Journal of Chemical Information and Modeling 44 (4): pp. 1402-1411, 2004. PDF
- Björn H. Junker: Networks in Biology. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 1-14, 2008 (preview) PDF
- Falk Schreiber: Graph Theory. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 15-28, 2008 (preview)
- Ralf Steuer, Gorka Zamora López: Global Network Properties. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 29-63, 2008 (preview)
- Dirk Koschützki: Network Centralities. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 65-84, 2008 (preview)
- Henning Schwöbbermeyer: Network Motifs. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 85-111, 2008 (preview)
- Balabhaskar Balasundaram, Sergiy Butenko: Network Clustering. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 113-138, 2008 (preview)
- Ina Koch, Monika Heiner: Petri Nets. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 139-179, 2008 (preview)
- Anatolij P. Potapov: Signal Transduction and Gene Regulation Networks. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 181-206, 2008 (preview)
- Frederik Börnke: Protein Interaction Networks. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 207-232, 2008 (preview)
- Márcio Rosa da Silva, Jibin Sun, Hongwu Ma, Feng He, An-Ping Zeng: Metabolic Networks. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 233-253, 2008 (preview)
- Birgit Gemeinholzer: Phylogenetic Networks. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 255-281, 2008 (preview)
- Ursula Gaedke: Ecological Networks. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 283-304, 2008 (preview)
- Dirk Steinhauser, Leonard Krall, Carsten Müssig, Dirk Büssis, Björn Usadel: Correlation Networks. Analysis of biological networks, Wiley Series on Bioinformatics, Wiley-VCH, Weinheim, pp. 305-333, 2008 (preview)
- Hiroto Saigo, Nicole Krämer, Koji Tsuda: Partial least squares regression for graph mining. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 578-586, 2008. PDF
- Graph-based transfer learning: He, J., Liu, Y., and Lawrence, R., In Proceeding of the 18th ACM Conference on information and Knowledge Management, CIKM '09. ACM, pp. 937-946, 2009. PDF
- Graph summarization with bounded error : by Navlakha, S., Rastogi, R., Shrivastava, N. In: SIGMOD'08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, ACM pp. 419-432, 2008. PDF
- Graph Summaries for Subgraph Frequency Estimation, by Angela Maduko, Kemafor Anyanwu, Amit P. Sheth, Paul Schliekelman, in The Semantic Web: Research and Applications, LNCS vol. 5021, Springer, pp 508-523, 2008. PDF
At the end of the seminar, the final participants were asked to fill in an anonymous on-line questionnaire about the seminar.
Learning objectives and subjective learning experience
The objectives that students themselves reported mostly concerned improving their scientific writing skills. Presentation skills were also mentioned, as well as learning about graph mining in general or about a specific topic. All students said they had met their learning objectives. Due to the small number of participants, the view on graph mining was not broad, however.
Seminar routines and work hours
The standard components of seminars, that is the actual writing of the report (in two phases), preparation of the oral presentation, and the final meeting, were expectedly considered most important elements (mean scores between 4.75 and 5 out of 5).
Receiving feedback from other students (in face-to-face feedback discussions) was important (4), evaluating reports of others not quite as much (3.5). Experiences from getting and giving feedback for slides were neutral (3).
The early phases of selecting source material, preparing report plan and a two-slide presentation (2.75) and having the five-minute presentations of the topics (2.5) were considered relatively unimportant. (However, from the point of view of seminar chairs these seem to be important for making sure all students get a good start with their work.)
The average amount of work put into the seminar was about 70 hours, but the variance was large. The average corresponds well to the theoretical amount of work for a 3 cr seminar.
- the open and positive atmosphere was praised
- conflicting instructions on the web page and in Moodle caused unnecessary confusion and some extra work
- confusing/misleading deadlines and procedures for submitting and reviewing the plan resulted in unexpected extra work
- the five minute presentations were perhaps too short, how about 10 minutes?
- the constructive feedback received for seminar reports was considered valuable
- there could have been more time to prepare the slides
- the great number of drop-outs was discouraging
- the schedule (few longer meetings and two full days of presentations) made participation easier
- the final presentations were good
Leaders of the seminar
Professor Hannu Toivonen, University of Helsinki
- Email: firstname.lastname@example.org
- WWW: http://www.cs.helsinki.fi/hannu.toivonen/
- Available at the seminar sessions and by email as agreed
Dr. Sébastien Mahler
- Email: email@example.com
- WWW: http://www.cs.helsinki.fi/u/mahler/
- Available at the seminar sessions and by email as agreed