58308109 Seminar: Predicting Structured Data (Spring 2008)

Time: periods III-IV, Thursdays 16-18, room C220
Organizers: PhD Huizhen Janey Yu , prof. Juho Rousu

BACKGROUND

Complex learning targets such as sequences, taxonomies and graphs are frequent in real-world applications, for example, hand-writing recognition (target is a sequence of letters), hierarchical classification (tree), and gene function prediction (poset), to only name a few.

The mainstream of machine learning research, in particular that of kernels methods, has been succesful in developing flexible and powerful methods for treating complex inputs. The complementary methods for complex outputs have so far received significantly less attention. The chief approachs towards complex targets has been to decompose the target (e.g. a hand-written word) prior to learning and learning each component (e.g. a character) indepedently. With this approach, dependencies between the components are not utilized.

During last five years, research in complex and structured output learning has emerged as one of the mega-trends in machine learning. In particular, methods marrying kernel methods and graphical models have received significant attention.

SEMINAR GOALS

The purpose of the seminar is to explore the recent progress in machine learning for complex and structured outputs

PREREQUISITES AND SEMINAR POSITION

The seminar is an elective advanced level seminar. It is also well-suited for post-graduate studies.

Prequisite knowledge for the semimar is basic knowledge about probabilistic modelling and machine learning. Familiarity with kernel methods and graphical models will be helpful.

COMPLETING THE SEMINAR

Each participant will prepare a slide presentation of approximately 45 minutes (rule of thumb: 2 minutes per slide => approx. 22-23 slides). The pdf file of the draft presentation is sent no later than Monday preceding the presentation time via email to Janey.
In addition, succesfull completion of the course requires active participation in the seminar
Grading will be pass/fail

TOPICS

Kernels for Structured Data

D. Haussler: Convolution Kernels on Discrete Structures, Technical Report UCSC-CRL-99-10, University of California at Santa Cruz, 1999

T. Gärtner: A Survey of Kernels for Structured Data, SIGKDD Explorations 5, pp. 49-58, 2003

String kernels

H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, C. Watkins: Text Classification using String Kernels. Journal of Machine Learning Research 2 (2002), pp. 419-444
S. V. N. Vishwanathan and A. Smola. Fast Kernels for String and Tree Matching. Advances in Neural Information Processing Systems 15, pp. 569–576, 2003.
C. Leslie, E. Eskin, J. Weston, W.S. Noble. Mismatch String Kernels for SVM Protein Classification. Advances in Neural Information Processing Systems 15, pp. 1441--1448, 2003.

Rational kernels

C. Cortes and M. Mohri: Rational kernels: theory and algorithms. Journal of Machine Learning Research 5 (2004), pp. 1035-1062

Tree kernels

M. Collins and N. Duffy: Convolution kernels for Natural Language. Advances in Neural Information Processing Systems 14, pp. 625-632, 2002
H. Kashima and T.Koyanagi. Kernels for Semi-structured Data. Proceedings of the 19th International Conference on Machine Learning, pp. 291--298, 2002

Graph kernels

T. Gärtner: On graph kernels: Hardness results and efficient alternatives. Proc. 16th Annual Conference on Computational Learning Theory, 2003
H. Kashima, K. Tsuda, A. Inokuchi:Marginalized kernels between labeled graphs. Proc. 20th International Conference on Machine Learning, 2003.
K. Borgwardt, C. Ong, S. Schönauer, S. V. N. Vishwanathan, A. Smola, H-P. Kriegel. Protein Function Prediction via Graph Kernels

Structured Prediction Models

G. Bakir, T. Hofmann, Schölkopf, A. Smola, B. Taskar, S.V.N. Vishwanathan: Modeling Structure via Graphical Models. Chapter 3 in Predicting Structured Data, MIT Press, 2007

J. Weston, G. Bakir, O. Bousquet, T. Mann, W.S. Noble and B. Schölkopf: Joint Kernel Maps. Chapter 4 in Predicting Structured Data, MIT Press, 2007
B. Taskar, C. Guestrin, D. Koller. Max-Margin Markov Networks. Advances in Neural Information Processing Systems 16, 2004
J. Rousu, C. Saunders, S. Szedmak and J. Shawe-Taylor: Efficient algorithms for Max-Margin Structured Classification. Chapter 6 in Predicting Structured Data, MIT Press, 2007

Structured Prediction Applications

Sequence annotation

Michael Collins: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithm. Proceedings of the ACL-02 conference on Empirical methods in natural language processing, 2002

Yasemin Altun, Ioannis Tsochantaridis, Thomas Hofmann: Hidden Markov Support Vector Machines. Proc. ICML'2003.

Elisa Ricci, Tijl de Bie, Nello Cristianini: Discriminative Sequence Labeling by Z-Score Optimization. Proc. of the 18th European Conference on Machine Learning (ECML 2007), Warsaw, September 2007.

Hierarchical multilabel classification

N. Cesa-Bianchi, C. Gentile, A. Tironi, L. Zaniboni: Incremental Algorithms for Hierarchical Classification Advances in Neural Information Processing Systems 17, 2005
J. Rousu, C. Saunders, S. Szedmak, J. Shawe-Taylor: Kernel-based Learning of Hierarchical Multilabel Classification Models. Journal of Machine Learning Research 7 (2006) 1601–1626

Z. Barutcuoglu, R.. Schapire and O.. Troyanskaya: Hierarchical multi-label prediction of gene function. Bioinformatics 22, 830-836, 2006

Supervised network inference & completion

J-P. Vert, Y. Yamanishi: Supervised Graph Inference. Advances in Neural Information Processing Systems, 2005
P. Geurts, N. Touleimat, M. Dutreix, F. d'Alché-Buc: Inferring biological networks with output kernel trees. BMC Bioinformatics 2007, 8(Suppl 2):S4

Optimization algorithms for structured prediction

Y. Altun, T. Hofmann, I. Tsochantaridis: Support Vector Machine Learning for Interdependent and Structured Output Spaces. Chapter 5 in Predicting Structured Data, MIT Press, 2007

H. Daume and D. Marcu: Learning as Search optimization. Chapter 9 in Predicting Structured Data, MIT Press, 2007
A. Smola, S.V.N. Vishwanathan, Quoc Le. Bundle Methods for Machine Learning. Advances in Neural Information Processing Systems 20, 2007

Generalization error analysis for structured output

D. McAllester. Generalization Bounds and Consistency for Structured Labeling. Chapter 11 in Predicting Structured Data, MIT Press, 2007

SCHEDULE

Date	Topic	Presenter
17.1	Introduction & allocation of presentations	Juho
24.1.	A primer on kernel methods (J.-P. Vert, K. Tsuda and B. Schölkopf)	Janey
31.1.	Convolution kernel and Fisher kernel (Haussler '99, Jaakkola and Haussler '99)	Esa
7.2.	Convolution kernel: a handout, Convolution kernel for natural language (Collins and Duffy)	Janey
14.2	Kernels for structured data, string kernels	Ilkka
21.2	Graph kernels Protein function prediction with graph kernels	Markus Jing
28.2 - 6.3	Exam week & period break	N/A
13.3	Structured prediction models	Juho
20.3	Easter break	N/A
27.3	Max-margin Markov networks	Krishnan
3.4	Hidden Markov Support Vector Machines	Aija
17.4	Bayesian hierarchical classification	Jukka
24.4	A general regression framework for learning string-to-string mapping	Abhishek
	THE END

Department of Computer Science

Department information