582410 Processing of large document collections, Material
Lecture 14.3. (Introduction)
- Slides (introduction): [PDF] [Power Point]
- Slides (text representation, text categorization): [PDF] [Power Point]
Other material used in the class:
Lecture 16.3. (Text categorization)
- Slides (problem setting, machine learning approach to text categorization, Rocchio method):[PDF] [Power Point]
Reading:
- (Recommended) Fabrizio Sebastiani: Text categorization. In Alessandro Zanasi (ed.), Text Mining and its Applications, WIT Press, Southampton, UK, 2005. Forthcoming.
- (Additional, more technical)Fabrizio Sebastiani: Machine learning in automated text categorization. ACM Computing Surveys. ( local copy, PostScript) (local copy, PDF)
Other material used in the lecture:
- Collection of 10 small documents (the same as above)
Lecture 21.3. (Text categorization)
- Slides (Evaluation of text classifiers, term selection): [PDF] [Power Point]
Lecture 23.3. (Text categorization, text summarization)
- Slides (Applications of text categorization, boosting, introduction to text summarization): [PDF] [Power Point]
Reading:
- Schapire, Singer and Singhal: Boosting and Rocchio Applied to Text Filtering. Proceedings of SIGIR-98, the 21st ACM International Conference on Research and Development in Information Retrieval
Lecture 28.3.
- Slides: [PDF] [Power Point]
Reading:
- H.P. Luhn, The Automatic Creation of Literature Abstracts, in "Advances in Automatic Text Summarization ", eds. Inderjeet Mani and Mark T. Maybury. Originally in IBM Journal of Research and Development, April 1958.
Buyukkokten, Garcia-Molina, Paepcke: Seeing the whole in parts: text summarization for web browsing on handhelf devices. The 10th International WWW Conference (WWW10). Hong Kong, China - May 1-5, 2001.
Kupiec, Pedersen, Chen: A trainable document summarizer. Proceedings of the 18th ACM-SIGIR Conference, p. 68-73, 1995. Also Chapter 5 (p.55-60) in Advances in automatic text categorization, eds. Mani, Maybury. The MIT Press, 1999. [Local copy (PDF)]
Other material used in the lecture:
Lecture 30.3.
- Slides: [PDF] [Power Point]
Reading:
Boguraev, Kennedy: Salience-based content characterisation of text documents. Chapter (p.99-110) in Advances in automatic text categorization, eds. Mani, Maybury. The MIT Press, 1999.
Other material used in the lecture:
Lecture 4.4.
- Slides: [PDF] [Power Point]
Reading:
Radev, Jing, Stys, Tam: Centroid-based summarization of multiple documents. Information Processing and Management, 40, 2004.
McKeown, Robin, Kukich: Generating concise natural language summaries. Information Processing and Management, 31 (5), 1995. Also Chapter 16 (p.233-263) in Advances in automatic text categorization, eds. Mani, Maybury. The MIT Press, 1999.
Lecture 6.4.
- Slides (Information extraction process): [PDF] [Power Point]
Reading:
Lecture 11.4.
- Slides (Information extraction: learning extraction patterns): [PDF] [Power Point]
Reading:
Riloff: Automatically Constructing a Dictionary for Information Extraction Tasks (AutoSlog). Proceedings of the 11th National Conference on Artificial Intelligence (AAAI-93), 1993, p. 811-816.
Riloff: Automatically Generating Extraction Patterns from Untagged Text (AutoSlog-TS). Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96), 1996, p. 1044-1049.
Riloff, Jones: Learning Dictionaries for Information Extraction by Multi-level Bootstrapping. Proceedings of the 16th National Conference on Artificial Intelligence (AAAI-99), 1999, p. 474-479.
Other material:
Lecture 20.4.
- Slides: [PDF] [Power Point]
Lecture 25.4.
- Slides: [PDF] [Power Point]
Reading:
Moldovan, Harabagiu, Pasca, Mihalcea, Goodrum, Girju and Rus, The Structure and Performance of an Open-Domain Question-Answering System Proceedings of the 38th Meeting of the Association for Computational Linguistics (ACL-2000), Hong Kong, October 2000, pages 563-570.
Cooper, Rüger. A Simple Question Answering System. TREC 2000.
Aunimo, Makkonen, Kuuskoski. Cross-Language Question Answering for Finnish. Proceedings of the Web Intelligence Symposium, held at the Finnish Artificial Intelligence Conference, September 2004.
Links:
WordNet, a lexical database for the English language [online search]
MOT Dictionary (access (at least) from the machines of the university)
Lecture 27.4.
- No new slides.
Useful links (free software, demos etc.)