Information Retrieval Methods, Exercise 1, 22 Jan 2007
- Read about subareas of information retrieval in the following
sources. Select at least 5 subareas and compare each area with the
retrieval process schema presented in the lectures slide. What are
the information needs, queries, documents and document features in this
area? What is matching based on? What will the process retrieve? You
will not necessarily find all these components in each subarea.
- H. Uszkoreit,
Language Technology, A First Overview
(Chapter 3; some of these methods are not necessarily information
retrieval methods, but they can be components in retrieval systems)
- (if you understand Finnish) K. Järvelin ja J. Kekäläinen,
Tiedonhaun menetelmät -opintoaineisto:
[1.4 Tiedonhaun tutkimuksen osa-alueita]
- Draw recall-precision graphs for the following retrieval results (the
number of relevant documents and their position in the result are given)
- Number: 5.
Positions: 2, 10, 17, 30, 45
- Number: 20.
Positions: 2, 5, 8, 11, 13, 16,
19, 20, 25, 26, 31, 33,37, 45, 55, 67, 80, 92, 111, 150.
Compare the information in the graphs,
e.g., how many documents the user will retrieve on recall level 75%
- Draw DCV graphs (recall and precision as a function of the size
of
the result)
for the following retrieval results (the number of relevant documents and
their position in the result are given):
- Number: 5.
Positions: 2, 10, 17, 30, 45
- Number: 20.
Positions: 2, 5, 8, 11, 13, 16,
19, 20, 25, 26, 31, 33, 37, 45, 55, 67, 80, 92, 111, 150.
Compare the information in the graphs,
e.g., how many documents the user will retrieve on recall level 75%.
What can you say about the relation
between the information in the graphs (2 and 3)?
- The relevance of the documents in the result may be compared
dichotomically (relevant/non-relevant) or by grading (significant /
important / marginal / non-relevant). Assume that the document collection contains 20 documents, whose relevance values are
|
1 |
n
|
6 |
n
|
11 |
n
|
16 |
n
|
2 |
i
|
7 |
n
|
12 |
n
|
17 |
n
|
3 |
n
|
8 |
s
|
13 |
n
|
18 |
m |
4 |
m |
9 |
i
|
14 |
i
|
19 |
m |
5 |
s
|
10 |
m |
15 |
i |
20 |
i
|
And the result set contains the following documents:
1, 3, 4, 7, 8, 9, 13, 15, 19, 20.
- Compute recall and precision, if all significant and important
documents are relevant (dichotomic relevance). Recall = The fraction of
relevant documents that were retrieved from the database.
Precision
= The fraction of retrieved documents that are relevant.
- Compute recall and precision for other relevance levels: only
significant are relevant; all except non-relevant are relevant.
Exercises 2-4 are from the lecture material
by Järvelin and
Kekäläinen.
Helena Ahonen-Myka
Greger Lindén (translation)