581257-8 Information retrieval methods - Exercises 8/2001 (4.4.)


Tasks marked with (**) will be counted as double tasks.
1. The Dublin Core (DC) metadata standard has an element type Relation. Give examples of its use in describing data. How these description might relate to the needs of the users in searching digital objects? Can you find some other useful types of relationships between some objects than those mentioned in the definition (among the qualifiers)? Is the Relation element suitable to describe documents that have a hypertext structure? The element is described in http://dublincore.org/documents/dcmes-qualifiers/. The list of all DC elements is in http://dublincore.org/documents/dces/.

2. The DC standard also contains the elements Coverage and Date. Consider these elements in the same way as Relation in task 1. For time descriptions, you can use as concrete examples documents like a schedule (trains, flights, etc.), a standard, a historical article, a proposal waiting for comments, an advertisement for a product, etc.

3. a) Let us consider scientific articles (or articles from some other genre with a fairly regular structure) stored at some WWW site in a traditional way (the whole document in one WWW page). There might be hyperlinks in the document but the essential contents of the document is not distributed in many pages. Suppose that we are trying to retrieve articles by some queries. How the structural features of the articles could be used in retrieval? What kind of features (describing the structure) might improve the retrieval?

b) Do the same evaluation for articles having a hypertext structure, i.e. the contents of the document distributed in several WWW pages.

4. (**) The article [1] describes a copy detection system for documents. Explain the main principles and features of that system.

References:

1. Monostori, K., Zaslavsky, A. & Schmidt, H., MatchDetectReveal: finding overlapping and similar digital documents. Information Resources Management Conference (IRMA2000), 2000.
http://www.csse.monash.edu.au/~kmonosto/MDR/Papers/irma2000.pdf



Hannu.Erkio@cs.Helsinki.FI

Hannu.Erkio@cs.Helsinki.FI