58308109 Seminar: Predicting Structured Data (Spring 2008)Time: periods III-IV, Thursdays 16-18, room C220
Organizers: PhD Huizhen Janey Yu , prof. Juho Rousu
Complex learning targets such as sequences, taxonomies and graphs are frequent in real-world applications, for example, hand-writing recognition (target is a sequence of letters), hierarchical classification (tree), and gene function prediction (poset), to only name a few.
The mainstream of machine learning research, in particular that of kernels methods, has been succesful in developing flexible and powerful methods for treating complex inputs. The complementary methods for complex outputs have so far received significantly less attention. The chief approachs towards complex targets has been to decompose the target (e.g. a hand-written word) prior to learning and learning each component (e.g. a character) indepedently. With this approach, dependencies between the components are not utilized.
During last five years, research in complex and structured output learning has emerged as one of the mega-trends in machine learning. In particular, methods marrying kernel methods and graphical models have received significant attention.
The purpose of the seminar is to explore the recent progress in machine
learning for complex and structured outputs
PREREQUISITES AND SEMINAR POSITION
The seminar is an elective advanced level seminar. It is also well-suited for post-graduate studies.
Prequisite knowledge for the semimar is basic knowledge about probabilistic modelling and machine learning. Familiarity with kernel methods and graphical models will be helpful.
COMPLETING THE SEMINAR
- Each participant will prepare a slide presentation of
45 minutes (rule of thumb: 2 minutes per slide => approx. 22-23
The pdf file of the draft presentation is sent no later than Monday preceding the
presentation time via email to Janey.
- In addition, succesfull completion of the course requires active participation in the seminar
- Grading will be pass/fail
Kernels for Structured Data
- D. Haussler: Convolution
Kernels on Discrete Structures, Technical Report UCSC-CRL-99-10,
University of California at Santa Cruz, 1999
- T. Gärtner: A Survey
of Kernels for Structured Data, SIGKDD Explorations 5, pp. 49-58,
- H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, C.
Classification using String Kernels. Journal of Machine Learning Research
2 (2002), pp. 419-444
- S. V. N. Vishwanathan and A. Smola. Fast
Kernels for String and Tree Matching. Advances in Neural
Information Processing Systems 15, pp. 569–576, 2003.
- C. Leslie, E. Eskin, J. Weston, W.S. Noble. Mismatch
String Kernels for SVM Protein Classification. Advances in
Neural Information Processing Systems 15, pp. 1441--1448,
- C. Cortes and M. Mohri: Rational kernels: theory and algorithms. Journal of Machine Learning Research 5 (2004), pp. 1035-1062
- M. Collins and N. Duffy: Convolution kernels for Natural Language. Advances in Neural Information Processing Systems 14, pp. 625-632, 2002
- H. Kashima and T.Koyanagi. Kernels for Semi-structured Data. Proceedings of the 19th International Conference on Machine Learning, pp. 291--298, 2002
- T. Gärtner: On graph kernels: Hardness results and efficient alternatives. Proc. 16th Annual Conference on Computational Learning Theory, 2003
- H. Kashima, K. Tsuda, A. Inokuchi:Marginalized kernels between labeled graphs. Proc. 20th International Conference on Machine Learning, 2003.
- K. Borgwardt, C. Ong, S. Schönauer, S. V. N. Vishwanathan, A. Smola, H-P. Kriegel. Protein Function Prediction via Graph Kernels
Structured Prediction Models
- G. Bakir, T. Hofmann, Schölkopf, A. Smola, B. Taskar, S.V.N. Vishwanathan: Modeling Structure via Graphical Models. Chapter 3 in Predicting Structured Data, MIT Press, 2007
- J. Weston, G. Bakir, O. Bousquet, T. Mann, W.S. Noble and B. Schölkopf: Joint Kernel Maps. Chapter 4 in Predicting Structured Data, MIT Press, 2007
- B. Taskar, C. Guestrin, D. Koller. Max-Margin Markov Networks.
Advances in Neural Information
Processing Systems 16, 2004
- J. Rousu, C. Saunders, S. Szedmak and J. Shawe-Taylor: Efficient algorithms for Max-Margin Structured Classification. Chapter 6 in Predicting Structured Data, MIT Press, 2007
Structured Prediction Applications
- Michael Collins: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithm. Proceedings of the ACL-02 conference on Empirical methods in natural language processing, 2002
- Yasemin Altun, Ioannis Tsochantaridis, Thomas Hofmann: Hidden Markov Support Vector Machines. Proc. ICML'2003.
- Elisa Ricci, Tijl de Bie, Nello Cristianini: Discriminative Sequence Labeling by Z-Score Optimization. Proc. of the 18th European Conference on Machine Learning (ECML 2007), Warsaw, September 2007.
Hierarchical multilabel classification
- N. Cesa-Bianchi, C. Gentile, A. Tironi, L. Zaniboni: Incremental
Algorithms for Hierarchical Classification Advances in Neural
Information Processing Systems 17, 2005
- J. Rousu, C. Saunders, S. Szedmak, J. Shawe-Taylor: Kernel-based Learning of Hierarchical Multilabel Classification Models. Journal of Machine Learning Research 7 (2006) 1601–1626
- Z. Barutcuoglu, R.. Schapire and O.. Troyanskaya: Hierarchical multi-label prediction of gene function. Bioinformatics 22, 830-836, 2006
Supervised network inference & completion
- J-P. Vert, Y. Yamanishi: Supervised Graph Inference. Advances in Neural Information Processing Systems, 2005
- P. Geurts, N. Touleimat, M. Dutreix, F. d'Alché-Buc: Inferring biological networks with output kernel trees. BMC Bioinformatics 2007, 8(Suppl 2):S4
Optimization algorithms for structured prediction
- Y. Altun, T. Hofmann, I. Tsochantaridis: Support Vector Machine Learning for Interdependent and Structured Output Spaces. Chapter 5 in Predicting Structured Data, MIT Press, 2007
- H. Daume and D. Marcu: Learning as Search optimization. Chapter 9 in Predicting Structured Data, MIT Press, 2007
- A. Smola, S.V.N. Vishwanathan, Quoc Le. Bundle Methods for Machine Learning. Advances in Neural Information Processing Systems 20, 2007
Generalization error analysis for structured output
- D. McAllester. Generalization Bounds and Consistency for Structured Labeling. Chapter 11 in Predicting Structured Data, MIT Press, 2007
Topic Presenter 17.1
Introduction & allocation of presentations
A primer on kernel methods (J.-P. Vert, K. Tsuda and B. Schölkopf)
Convolution kernel and Fisher kernel (Haussler '99, Jaakkola and Haussler '99)
Convolution kernel: a handout, Convolution kernel for natural language (Collins and Duffy) Janey 14.2
Kernels for structured data, string kernels Ilkka 21.2 Graph kernels
Protein function prediction with graph kernels
28.2 - 6.3
Exam week & period break
Structured prediction models
Max-margin Markov networks
Hidden Markov Support Vector Machines
Bayesian hierarchical classification
A general regression framework for learning string-to-string mapping