University homepage Suomenkielinen versio puuttuu Inte på svenska In english
University of Helsinki Department of Computer Science
 

Department of Computer Science

582602 Natural Language Processing, Spring 2008

(8 cp, 4 cu)

Lectures:
15.01 -- 21.02 Tuesdays, Thursdays 12-14 B119
11.03 -- 24.04 Tuesdays, Thursdays 12-14 B119

Instructor: Roman Yangarber
Lectures on singular value decomposition, latent semantic indexing: Ella Bingham

Language of instruction: English.

Synopsis

Vast amounts of data are available—notably on the World-wide Web—in plain natural language. These data may need to be analyzed at various levels, ranging from analysis of individual words or phrases, to full "understanding" of a complete text.

Computational analysis of human language is a complex problem, spanning several disciplines, including Computer Science and Linguistics. The focus of this course will be on analyzing written language, i.e., text.

The objective of this course is to introduce the methods for text analysis on different levels. The course will provide a broad introduction to the area. No prior knowledge of linguistics or computational linguistics is assumed.

Goals

To provide students with the basic foundations in Natural Language Processing and Computational Linguistics. The course will introduce:
  • the range of problems that the field deals with, and the state of the art
  • the standard methods, how they are applied, and evaluated.
After this course, the students should be able to:
  • attend higher-level seminars on advanced or special topics in NLP
  • advance their understanding by taking courses in related subjects, such as machine learning, artificial intelligence, information retrieval, data mining.
  • participate in industry or research projects, dealing with analysis of text.

Grading

No exam. Assignments/Exercises. Project work.
Students are graded based on their completed work: six short assignments, and two mini-projects.

Project work

During the course, 5 or 6 suggested mini-projects will be introduced, in addition to 6 shorter exercises. Each student is expected to complete all exercises, plus 2 of the mini-projects. (Each mini-project may require between 3 and 4 weeks of work.)

Pre-requisites:

  • Data Structures, Models of Programming and Computing.
  • Basic programming skills.
  • Interest in language or text.
  • Basic familiarity with these topics: Finite state automata (FSA), regular expressions, regular languages. (If you are unsure, you can review these topics in J&M,, chapter 2.)

Course Materials

Textbook: "Speech and Language Processing" by D. Jurafsky & J.H. Martin (J&M). You can purchase it in hard-copy, or download the required chapters of the book from the book's web site.

Other Course Materials are found at here (requires local access):

  • Lecture notes
  • Course Wiki
  • Assignments
  • Additional materials

Content

Rule-based and statistical analysis:
  • morphology and morphological analysis -- analysis of words,
  • part-of-speech tagging,
  • language modeling,
  • name classification,
  • syntax, grammars, and context-free parsing,
  • shallow syntax/chunking,
  • semantic analysis and word sense disambiguation,
  • discourse analysis.

The course will also cover higher-level applications which combine several levels of analysis: information extraction or finding facts from text.

Tentative Schedule:

Note: this schedule may be updated depending on the progress in the course. The dates specify when exercises are approximately expected to be assigned. The date when an exercise was assigned, is marked clearly in bold (and the due date indicated, approx. 2 weeks later).
Week 3:
(15.01):
  • Introduction to NLP and Computational Linguistics
(17.01):
  • Applications: Text Understanding and Information Extraction

Week 4:
(22.01):
  • (Meeting Canceled.) Topics postponed:
(24.01):
  • Levels of analysis for IE, and other NLP applications

Week 5:
(29.01):
  • Morphology. Finite-state Transducers
(31.01):
  • Finite-state Morphology, continued.

Week 6:
(05.02):
  • Tutorial: Annotation and Evaluation tools for IE.
  • Assignment 1 (assigned): Annotation of facts in text documents.
(07.02):
  • Tutorial on FS morphology using PC-Kimmo toolkit
  • (for assignment and project on morphology)

Week 7:
(12.02):
  • Introduce Assignment 2: finite state morphology
  • Introduce Project: Two-level Morphological analysis (non-English)

  • Spelling correction.
(14.02):
  • Assignment 2 (Assigned): Morphology
  • Spelling correction. Language modeling. N-Grams.

Week 8:
(19.02):
  • Introduce Assignment 3: N-grams.
  • Introduce Project: N-grams and Spelling correction.

  • (E.Bingham)
  • Bag-of-words methods; Preliminaries.
  • Introduction to Singular Value Decomposition (SVD)
(21.02):
  • (E.Bingham)
  • Applications of SVD: Latent Semantic Indexing (LSI)

Week 9: (No course meeting: exam week)
(26.02)
  • (Assignment 1 due)
(28.02)

Week 10: (No course meeting: break)
(04.03)
  • (Assignment 2 due)
(06.03)

Week 11:
(11.03):
  • (E.Bingham)
  • Google's PageRank algorithm, spectral clustering.
(13.03):
  • Introduce Assignment on SVD and LSI.

Week 12:
(18.03):
  • Syntax. Parsing [J&M, chap. 11]
  • (Assignment 3 due)
(20.03): (No course meeting: Easter Holiday)

Week 13:
(25.03): (No course meeting: Easter Holiday)
(27.03):
  • (G. Lindén)
  • Parsing. [J&M, chap. 12]

Week 14:
(01.04):
  • (Assignment 3b (LSI) due 31.03)
  • (lecture canceled)

(03.04):
  • (lecture canceled)

Week 15:
(08.04):
  • Parsing: Shallow Parsing/Chunking. [J&M, chap. 12]
  • Introduce Project: implement simple Grammar for Parsing, tools.
(10.04):
  • Assign Assignment 4: CFG and Parsing (short).
  • Assign Assignment 5: Parsing II

Week 16:
(15.04):
  • Part of speech tagging, Hidden Markov Models (HMMs)
(17.04):
  • (Assignment 4 (Parsing) due -- try.)
  • Hidden Markov Models (HMMs), continued

Week 17:
(22.04):
  • Lecture 10.a: HMMs/Algorithms
  • Assigned Assignment 6. (Due Thursday 15.05)
(24.04):
  • Lecture 10.b: HMM Training
  • Introduce Project: Implement simple POS tagger.
  • (Assignment 5 (Parsing II) due -- try.)

Week 18:

Week 19:
(02.05 Friday): Make-up lecture
  • Time: 10:00 (Email instructor in case of problems)
  • Lecture 11.a: Word sense disambiguation: supervised methods
  • Lecture 11.b: Unsupervised WSD, (D. Yarowsky)
  • Introduce Project: Word sense disambiguation.
Week 20:
(06.05 Tuesday): Make-up lecture
  • Time: 12:15 (Email instructor in case of problems)
  • Lecture 13: Automatic acquisition of semantic knowledge

  • Further topics:
  • Lecture 12: Semantics: Distributional similarity

Registration

Register through the department registration system.

Contact

Department of Computer Science                   Street address:
P.O. Box 68                          Exactum Building, Room A224
FIN-00014 University of Helsinki      Gustaf Hällströmin katu 2B
Finland

Last update: Friday, 02-May-2008 14:04:08 EEST
(Page layout < O. Heinonen < M. Raento < G. Lindén)