Statistical Machine Translation

Algorithms and machine learning
Advanced studies
Year Semester Date Period Language In charge
2013 autumn 18.11-19.12. 2-2 English Roman Yangarber

Those wishing to attend the course are asked to contact Atro Voutilainen (first.last at helsinki dot fi).



Invited by the BAULT (Building and use of language technology) consortium, Dr. Christer Samuelsson (DFKI, Germany) gives a guest lecture (details later) and course on SMT Nov. 18 - Dec. 19: "Let's fake an SMT using Unix, HMMs, and GAs".

The course consists of 16 90-minute sessions (lectures, demonstrations, programming exercises) and is intended for students and researchers with (Unix) programming capabilities and an interest in Language Technology and Machine Translation. The sessions take place in the Center Campus:
- Mon 16-18 (18.11., 25.11., 2.12., 9.12., 16.12.) room A112 (Metsätalo)
- Tue 14-16 (19.11.) room A112 (Metsätalo)
- Wed 16-18 (20.11., 27.11., 4.12., 11.12., 18.12.) hall 29 (Metsätalo)
- Thu 12-14 (21.11., 28.11., 5.12., 12.12., 19.12.) P344 (Porthania)

Those wishing to attend the course are asked to contact Atro Voutilainen (first.last at helsinki dot fi). Use of the course (3 credits) as part of studies is negotiable. Grading is based on evaluation of student assignments.

Tentative course schedule:
"Let's fake an SMT using Unix, HMMs, and GAs."
Christer Samuelsson

18/11: Course overview. Word n-gram models. Variable length n-gram models. K&S.
19/11: HMM-based PoS tagging. K&S.
20/11: K-means clustering and the EM algorithm. IBM model 2 and BLEU scores. Bishop.
21/11: Faking a simple SMT. Handouts. Assignment 1 out.
25,27, and 28/11: Students implement SMT assisted by my stunt double.
2/12: Student presentations Assignment 1.
4/12: Genetic Algos I. Wahde.
5/12: Genetic Algos II. Wahde.
9/12: Genetic Algos and SMT: word order and word insertions. Wahde/Handouts. Assignment 2 out.
11/12: Ant Colony methods. Wahde.
12/12: Particle Swarm methods. Wahde.
16 and 18/12: Students add GA to handle word order and word insertions assisted by me.
19/12: Student presentations Assignment 2.
Course material (will be provided):
K&S: Krenn & Samuelsson "The Linguists Guide to Statistics"
Bishop: Christopher Bishop "Machine Learning and Pattern Recognition"
Wahde: "Biologically Inspired Optimization Methods"