58308109 Seminar: Predicting Structured Data (Spring 2008)
Time: periods IIIIV, Thursdays 1618, room C220Organizers: PhD Huizhen Janey Yu , prof. Juho Rousu
BACKGROUND
Complex learning targets such as sequences, taxonomies and graphs are frequent in realworld applications, for example, handwriting recognition (target is a sequence of letters), hierarchical classification (tree), and gene function prediction (poset), to only name a few.
The mainstream of machine learning research, in particular that of kernels methods, has been succesful in developing flexible and powerful methods for treating complex inputs. The complementary methods for complex outputs have so far received significantly less attention. The chief approachs towards complex targets has been to decompose the target (e.g. a handwritten word) prior to learning and learning each component (e.g. a character) indepedently. With this approach, dependencies between the components are not utilized.
During last five years, research in complex and structured output learning has emerged as one of the megatrends in machine learning. In particular, methods marrying kernel methods and graphical models have received significant attention.
SEMINAR GOALS
The purpose of the seminar is to explore the recent progress in machine
learning for complex and structured outputs
PREREQUISITES AND SEMINAR POSITION
The seminar is an elective advanced level seminar. It is also wellsuited for postgraduate studies.
Prequisite knowledge for the semimar is basic knowledge about probabilistic modelling and machine learning. Familiarity with kernel methods and graphical models will be helpful.
COMPLETING THE SEMINAR
 Each participant will prepare a slide presentation of
approximately
45 minutes (rule of thumb: 2 minutes per slide => approx. 2223
slides).
The pdf file of the draft presentation is sent no later than Monday preceding the
presentation time via email to Janey.
 In addition, succesfull completion of the course requires active participation in the seminar
 Grading will be pass/fail
TOPICS
Kernels for Structured Data
 D. Haussler: Convolution
Kernels on Discrete Structures, Technical Report UCSCCRL9910,
University of California at Santa Cruz, 1999
 T. Gärtner: A Survey
of Kernels for Structured Data, SIGKDD Explorations 5, pp. 4958,
2003
String kernels
 H. Lodhi, C. Saunders, J. ShaweTaylor, N. Cristianini, C.
Watkins: Text
Classification using String Kernels. Journal of Machine Learning Research
2 (2002), pp. 419444
 S. V. N. Vishwanathan and A. Smola. Fast
Kernels for String and Tree Matching. Advances in Neural
Information Processing Systems 15, pp. 569–576, 2003.
 C. Leslie, E. Eskin, J. Weston, W.S. Noble. Mismatch
String Kernels for SVM Protein Classification. Advances in
Neural Information Processing Systems 15, pp. 14411448,
2003.
Rational kernels
 C. Cortes and M. Mohri: Rational kernels: theory and algorithms. Journal of Machine Learning Research 5 (2004), pp. 10351062
Tree kernels
 M. Collins and N. Duffy: Convolution kernels for Natural Language. Advances in Neural Information Processing Systems 14, pp. 625632, 2002
 H. Kashima and T.Koyanagi. Kernels for Semistructured Data. Proceedings of the 19th International Conference on Machine Learning, pp. 291298, 2002
Graph kernels
 T. Gärtner: On graph kernels: Hardness results and efficient alternatives. Proc. 16th Annual Conference on Computational Learning Theory, 2003
 H. Kashima, K. Tsuda, A. Inokuchi:Marginalized kernels between labeled graphs. Proc. 20th International Conference on Machine Learning, 2003.
 K. Borgwardt, C. Ong, S. Schönauer, S. V. N. Vishwanathan, A. Smola, HP. Kriegel. Protein Function Prediction via Graph Kernels
Structured Prediction Models
 G. Bakir, T. Hofmann, Schölkopf, A. Smola, B. Taskar, S.V.N. Vishwanathan: Modeling Structure via Graphical Models. Chapter 3 in Predicting Structured Data, MIT Press, 2007
 J. Weston, G. Bakir, O. Bousquet, T. Mann, W.S. Noble and B. Schölkopf: Joint Kernel Maps. Chapter 4 in Predicting Structured Data, MIT Press, 2007
 B. Taskar, C. Guestrin, D. Koller. MaxMargin Markov Networks.
Advances in Neural Information
Processing Systems 16, 2004
 J. Rousu, C. Saunders, S. Szedmak and J. ShaweTaylor: Efficient algorithms for MaxMargin Structured Classification. Chapter 6 in Predicting Structured Data, MIT Press, 2007
Structured Prediction Applications
Sequence annotation
 Michael Collins: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithm. Proceedings of the ACL02 conference on Empirical methods in natural language processing, 2002
 Yasemin Altun, Ioannis Tsochantaridis, Thomas Hofmann: Hidden Markov Support Vector Machines. Proc. ICML'2003.
 Elisa Ricci, Tijl de Bie, Nello Cristianini: Discriminative Sequence Labeling by ZScore Optimization. Proc. of the 18th European Conference on Machine Learning (ECML 2007), Warsaw, September 2007.
Hierarchical multilabel classification
 N. CesaBianchi, C. Gentile, A. Tironi, L. Zaniboni: Incremental
Algorithms for Hierarchical Classification Advances in Neural
Information Processing Systems 17, 2005
 J. Rousu, C. Saunders, S. Szedmak, J. ShaweTaylor: Kernelbased Learning of Hierarchical Multilabel Classification Models. Journal of Machine Learning Research 7 (2006) 1601–1626
 Z. Barutcuoglu, R.. Schapire and O.. Troyanskaya: Hierarchical multilabel prediction of gene function. Bioinformatics 22, 830836, 2006
Supervised network inference & completion
 JP. Vert, Y. Yamanishi: Supervised Graph Inference. Advances in Neural Information Processing Systems, 2005
 P. Geurts, N. Touleimat, M. Dutreix, F. d'AlchéBuc: Inferring biological networks with output kernel trees. BMC Bioinformatics 2007, 8(Suppl 2):S4
Optimization algorithms for structured prediction
 Y. Altun, T. Hofmann, I. Tsochantaridis: Support Vector Machine Learning for Interdependent and Structured Output Spaces. Chapter 5 in Predicting Structured Data, MIT Press, 2007
 H. Daume and D. Marcu: Learning as Search optimization. Chapter 9 in Predicting Structured Data, MIT Press, 2007
 A. Smola, S.V.N. Vishwanathan, Quoc Le. Bundle Methods for Machine Learning. Advances in Neural Information Processing Systems 20, 2007
Generalization error analysis for structured output
 D. McAllester. Generalization Bounds and Consistency for Structured Labeling. Chapter 11 in Predicting Structured Data, MIT Press, 2007
SCHEDULE

Date
Topic Presenter 17.1
Introduction & allocation of presentations
Juho 24.1.
A primer on kernel methods (J.P. Vert, K. Tsuda and B. Schölkopf)
Janey 31.1.
Convolution kernel and Fisher kernel (Haussler '99, Jaakkola and Haussler '99)
Esa 7.2.
Convolution kernel: a handout, Convolution kernel for natural language (Collins and Duffy) Janey 14.2
Kernels for structured data, string kernels Ilkka 21.2 Graph kernels
Protein function prediction with graph kernels
Markus
Jing28.2  6.3
Exam week & period break
N/A
13.3
Structured prediction models
Juho 20.3
Easter break
N/A
27.3
Maxmargin Markov networks
Krishnan 3.4
Hidden Markov Support Vector Machines
Aija 17.4
Bayesian hierarchical classification
Jukka
24.4
A general regression framework for learning stringtostring mapping
Abhishek
THE END