582410 Processing of large document collections, Exercise 7

The solutions should be ready for inspection by Thursday 22.11.2001 (midnight).


  1. Google is a well-known search engine for the Web. Study the article: Sergey Brin and Lawrence Page The Anatomy of a Large-Scale Hypertextual Web Search Engine , which describes the implementation of Google as it still was an academic project (~1998).

    Explain the high level architecture of Google as shown in Figure 1 (in the article):

  2. Compare text categorization and text summarization as learning problems (~ classification problems), for instance:




Helena.Ahonen-Myka