IRM Spring 2007, Exercise 1

Information Retrieval Methods, Exercise 1, 22 Jan 2007

Read about subareas of information retrieval in the following sources. Select at least 5 subareas and compare each area with the retrieval process schema presented in the lectures slide. What are the information needs, queries, documents and document features in this area? What is matching based on? What will the process retrieve? You will not necessarily find all these components in each subarea.

H. Uszkoreit, Language Technology, A First Overview (Chapter 3; some of these methods are not necessarily information retrieval methods, but they can be components in retrieval systems)
(if you understand Finnish) K. Järvelin ja J. Kekäläinen, Tiedonhaun menetelmät -opintoaineisto: [1.4 Tiedonhaun tutkimuksen osa-alueita]

Draw recall-precision graphs for the following retrieval results (the number of relevant documents and their position in the result are given)

Number: 5. Positions: 2, 10, 17, 30, 45
Number: 20. Positions: 2, 5, 8, 11, 13, 16, 19, 20, 25, 26, 31, 33,37, 45, 55, 67, 80, 92, 111, 150.

Compare the information in the graphs, e.g., how many documents the user will retrieve on recall level 75%

Draw DCV graphs (recall and precision as a function of the size of the result) for the following retrieval results (the number of relevant documents and their position in the result are given):

Number: 5. Positions: 2, 10, 17, 30, 45
Number: 20. Positions: 2, 5, 8, 11, 13, 16, 19, 20, 25, 26, 31, 33, 37, 45, 55, 67, 80, 92, 111, 150.

Compare the information in the graphs, e.g., how many documents the user will retrieve on recall level 75%.

What can you say about the relation between the information in the graphs (2 and 3)?

The relevance of the documents in the result may be compared dichotomically (relevant/non-relevant) or by grading (significant / important / marginal / non-relevant). Assume that the document collection contains 20 documents, whose relevance values are

And the result set contains the following documents:

1, 3, 4, 7, 8, 9, 13, 15, 19, 20.

Compute recall and precision, if all significant and important documents are relevant (dichotomic relevance). Recall = The fraction of relevant documents that were retrieved from the database. Precision = The fraction of retrieved documents that are relevant.
Compute recall and precision for other relevance levels: only significant are relevant; all except non-relevant are relevant.

Exercises 2-4 are from the lecture material by Järvelin and Kekäläinen.

Helena Ahonen-Myka
Greger Lindén (translation)