582602 Natural Language Processing, Spring 2008
(8 cp, 4 cu)
Lectures:
15.01 -- 21.02 Tuesdays, Thursdays 12-14 B119
11.03 -- 24.04 Tuesdays, Thursdays 12-14 B119
Instructor:
Roman Yangarber
Lectures on singular value decomposition, latent semantic indexing:
Ella Bingham
Language of instruction: English.
Synopsis
Vast amounts of data are available—notably on the World-wide Web—in plain natural language. These data may need to be analyzed at various levels, ranging from analysis of individual words or phrases, to full "understanding" of a complete text.Computational analysis of human language is a complex problem, spanning several disciplines, including Computer Science and Linguistics. The focus of this course will be on analyzing written language, i.e., text.
The objective of this course is to introduce the methods for text analysis on different levels. The course will provide a broad introduction to the area. No prior knowledge of linguistics or computational linguistics is assumed.
Goals
To provide students with the basic foundations in Natural Language Processing and Computational Linguistics. The course will introduce:- the range of problems that the field deals with, and the state of the art
- the standard methods, how they are applied, and evaluated.
- attend higher-level seminars on advanced or special topics in NLP
- advance their understanding by taking courses in related subjects, such as machine learning, artificial intelligence, information retrieval, data mining.
- participate in industry or research projects, dealing with analysis of text.
Grading
No exam. Assignments/Exercises. Project work.Students are graded based on their completed work: six short assignments, and two mini-projects.
Project work
During the course, 5 or 6 suggested mini-projects will be introduced, in addition to 6 shorter exercises. Each student is expected to complete all exercises, plus 2 of the mini-projects. (Each mini-project may require between 3 and 4 weeks of work.)Pre-requisites:
- Data Structures, Models of Programming and Computing.
- Basic programming skills.
- Interest in language or text.
- Basic familiarity with these topics: Finite state automata (FSA), regular expressions, regular languages. (If you are unsure, you can review these topics in J&M,, chapter 2.)
Course Materials
Textbook: "Speech and Language Processing" by D. Jurafsky & J.H. Martin (J&M). You can purchase it in hard-copy, or download the required chapters of the book from the book's web site.Other Course Materials are found at here (requires local access):
- Lecture notes
- Course Wiki
- Assignments
- Additional materials
Content
Rule-based and statistical analysis:- morphology and morphological analysis -- analysis of words,
- part-of-speech tagging,
- language modeling,
- name classification,
- syntax, grammars, and context-free parsing,
- shallow syntax/chunking,
- semantic analysis and word sense disambiguation,
- discourse analysis.
The course will also cover higher-level applications which combine several levels of analysis: information extraction or finding facts from text.
Tentative Schedule:
Note: this schedule may be updated depending on the progress in the course. The dates specify when exercises are approximately expected to be assigned. The date when an exercise was assigned, is marked clearly in bold (and the due date indicated, approx. 2 weeks later).- Week 3:
-
(15.01):
- Introduction to NLP and Computational Linguistics
- Applications: Text Understanding and Information Extraction
- Week 4:
-
(22.01):
- (Meeting Canceled.) Topics postponed:
- Levels of analysis for IE, and other NLP applications
- Week 5:
-
(29.01):
- Morphology. Finite-state Transducers
- Finite-state Morphology, continued.
- Week 6:
-
(05.02):
- Tutorial: Annotation and Evaluation tools for IE.
- Assignment 1 (assigned): Annotation of facts in text documents.
- Tutorial on FS morphology using PC-Kimmo toolkit
- (for assignment and project on morphology)
- Week 7:
-
(12.02):
- Introduce Assignment 2: finite state morphology
- Introduce Project: Two-level Morphological analysis (non-English)
- Spelling correction.
- Assignment 2 (Assigned): Morphology
- Spelling correction. Language modeling. N-Grams.
- Week 8:
-
(19.02):
- Introduce Assignment 3: N-grams.
- Introduce Project: N-grams and Spelling correction.
- (E.Bingham)
- Bag-of-words methods; Preliminaries.
- Introduction to Singular Value Decomposition (SVD)
- (E.Bingham)
- Applications of SVD: Latent Semantic Indexing (LSI)
- Week 9: (No course meeting: exam week)
-
(26.02)
- (Assignment 1 due)
- Week 10: (No course meeting: break)
-
(04.03)
- (Assignment 2 due)
- Week 11:
-
(11.03):
- (E.Bingham)
- Google's PageRank algorithm, spectral clustering.
- Introduce Assignment on SVD and LSI.
- Week 12:
-
(18.03):
- Syntax. Parsing [J&M, chap. 11]
- (Assignment 3 due)
- Week 13:
-
(25.03): (No course meeting: Easter Holiday)
(27.03):- (G. Lindén)
- Parsing. [J&M, chap. 12]
- Week 14:
-
(01.04):
- (Assignment 3b (LSI) due 31.03)
- (lecture canceled)
- (lecture canceled)
- Week 15:
-
(08.04):
- Parsing: Shallow Parsing/Chunking. [J&M, chap. 12]
- Introduce Project: implement simple Grammar for Parsing, tools.
- Assign Assignment 4: CFG and Parsing (short).
- Assign Assignment 5: Parsing II
- Week 16:
-
(15.04):
- Part of speech tagging, Hidden Markov Models (HMMs)
- (Assignment 4 (Parsing) due -- try.)
- Hidden Markov Models (HMMs), continued
- Week 17:
-
(22.04):
- Lecture 10.a: HMMs/Algorithms
- Assigned Assignment 6. (Due Thursday 15.05)
- Lecture 10.b: HMM Training
- Introduce Project: Implement simple POS tagger.
- (Assignment 5 (Parsing II) due -- try.)
- Week 18:
- Week 19:
-
(02.05 Friday): Make-up lecture
- Time: 10:00 (Email instructor in case of problems)
- Lecture 11.a: Word sense disambiguation: supervised methods
- Lecture 11.b: Unsupervised WSD, (D. Yarowsky)
- Introduce Project: Word sense disambiguation.
- Week 20:
-
(06.05 Tuesday): Make-up lecture
- Time: 12:15 (Email instructor in case of problems)
- Lecture 13: Automatic acquisition of semantic knowledge
- Further topics:
- Lecture 12: Semantics: Distributional similarity
Registration
Register through the department registration system.Contact
Department of Computer Science Street address: P.O. Box 68 Exactum Building, Room A224 FIN-00014 University of Helsinki Gustaf Hällströmin katu 2B Finland
(Page layout < O. Heinonen < M. Raento < G. Lindén)

