University homepage Suomenkielinen versio puuttuu Inte på svenska In english
University of Helsinki Department of Computer Science
 

Department of Computer Science

Research Seminar: Analysis of Text (3 op, 2 ov) (2 cu)

Instructor: Prof. Roman Yangarber
Dates: Fall 2007: periods I-II: 4.9-9.10 and 30.10-4.12
Time: Tuesdays 10:00-12:00
Location: Exactum Building, Room C220

Synopsis

Computational analysis of human language is a complex and multi-faceted problem, which spans several related disciplines. The focus of the seminar will be on analyzing information in written language (i.e., text).

Vast amounts of data are available (most notably, on the World-wide Web) in plain, natural-language form, i.e., as text. Depending on the application, these data may need to be analyzed at various levels, ranging from the analysis of individual units at the word (or sub-word) level, to full understanding -- however one defines understanding -- at the document level (or super-document level, e.g., large collections of documents). And everything in between.

The objective of this seminar is to look at the problem of text analysis from different angles and to survey methods that address various aspects of the problem. The seminar aims to provide an overview of the area, and identify potential research topics for students.

The seminar will open with an introductory lecture, covering the basic areas of research, some common practices with applications and demonstrations, and some problems currently under research.

Topics may be taken from any field related to language technology and text processing, including Natural language processing/Computational linguistics, Machine learning, Artificial intelligence, Information retrieval, Data mining.

Target Audience: Students with a strong interest in natural language or text and course background in any of the areas listed above.

Prerequisites

Solid understanding of algorithms and data structures; Scientific Writing course; solid course background (or relevant work/project experience) in one of the related areas listed above; strong interest in natural language or text analysis.

Familiarity with computational linguistics (e.g., Course "Natural Language Processing") or language technology is a plus.

Seminar format

The seminar will meet once a week to listen to and discuss presentations by the students about research on text analysis technology.

The language of the seminar is English.

After the first session, each student will select a topic. Thereafter, each student is expected to participate in the following activities:

Each participant will select one or more papers from the list of suggested articles. Participants are encouraged to suggest topics and papers of their own interest (to be confirmed with the instructor).

  • Brief presentation: A presentation of the topic (maximum 10 minutes). The presentation should be on a general level.
  • Draft: A draft version (approx. 10 pages) of an essay on the topic. The draft should be ready two weeks before the date of the presentation. The draft will be published on the course Web page, so other participants can study it and comment on it.
  • Reviewing: Comments (about 1 page) on other student's drafts. The comments should be ready one week after receiving the draft. Each student will act as a reviewer on the drafts of two other students. The comments will not be published on the course web page.
  • Essay: on the student's chosen topic. The student should take comments by other students (especially from the reviewers) into account. The body of the essay should be about 10 pages long, in pdf format. It must be completed before the seminar presentation. The essay will be published on the course web page.
  • Oral presentation: A presentation of the topic based on the written essay, about 60 minutes long. All students should attend all presentations. (Absence in special circumstances may be accommodated.)
  • Discussion: A presentation may be interrupted and followed by questions and comments from the audience. All listeners are expected to act as opponents and take part in the discussion.

List of topics:

Materials:

Schedule:

These are found on the course Wiki

More information, please contact the instructor


Roman Yangarber

Last update: Thursday, 25-Oct-2007 00:10:10 EEST
(Page layout < O. Heinonen < M. Raento < G. Lindén)