D1 = (summer:0.8, material:0.3, production:0.9, cloth:1.0)
D2 = (child:0.5, material:0.3, winter:0.8, production:0.9, cloth:0.9)
D3 = (use:0.3, child:1.0, material:0.8, winter:0.6, leather:0.2)
D4 = (summer:0.2, child:0.2, cotton:0.6, production:0.8, cloth:0.2)
D5 = (use:0.6, child:1.0, cloth:1.0, leather:0.1)
D6 = (child:0.9, cloth:0.5)
D7 = (child:0.8, cloth=0.9, leather:0.1)
D8 = (import:0.4, cloth:0.4, leather:0.7)
D9 = (import:1.0, cloth:0.8)
D10 = (summer:0.5, child:0.8, cotton:0.4, import:0.8, cloth:0.7)
D11 = (child:0.7, import:0.9, cloth:0.2)
D12 = (cotton:1.0, production:0.8)
To learn how various coefficients describe the similarity of the
documents with a query, calculate some example values for the similarity
coefficients: inner product, cosine coefficient, Dice's c. and
Jaccard's c. for the query
Q = (1.0, 1.0, 0.7, 0, 0.3, 0, 0, 0, 0, 0),
i.e. (child:1.0, cloth:1.0, cotton:0.7, material:0.3).
2. Explain the variation of the similarity coefficients mentioned in
task 1 by calculating their values in some artificial situations:
- a document has e..g. t, t/2, t/5, 4t/5 (generally t/k, 2t/k,...,
(k-1)t/k) 1's (the other terms 0),
- a query has e.g. p 1's; p is either slightly or very much smaller than t
(t is the total number of terms).
(As all coefficients are based on the inner product. the variations describe the normalization factor,)
3. Consider the relevance feedback principle (Salton, p. 319-320).
a) What can we say about the length of the modified queries (the number
of query terms)?
b) Is it possible to use this technique if the result of the initial
query is empty? (Is it possible to prevent that the result is empty?)
c) Could you in some general way characterize the situations where the
relevance feedback technique is usable or not usable?
d) Do the WWW search engines use any query modification techniques?
(based on relavance feedback, some other principles - or anyway support
some technique with many succeeding phases)
4. (**) Explain the most important results in article [1] in a summary of 1-2 pages.
References:
1. Magennis, M. & van Rijsbergen, C.J., The potential and actual
effectiveness of interactive query expansion. Proc. ACM SIGIR97 Conf.,
1997.
(http://dev.acm.org/pubs/contents/proceedings/ir/258525/
p324-magennis/p324-magennis.pdf;
the article is available at least using the workstations at the
department, a copy is also in the course folder (room A412))
Hannu.Erkio@cs.Helsinki.FI