Yliopiston etusivulle Suomeksi Inte på svenska No english version available
Helsingin yliopisto Tietojenk%/1€Œiso8859-15äsittelytieteen laitos
 

Tietojenkäsittelytieteen laitos

581257 Information Retrieval Methods (6 ECTS, 3 cu) Spring 2006

On Using Lucene

Antoine Doucet

You will need to use Lucene for the project work of the course. You can find all the information and the source files on the Lucene project pages. I will below summarize everything you need to know about Lucene for the project work, in a hopefully self-sufficient manner.


Installation

You may download the binaries from the Lucene homepage (for linux, the appropriate compressed file is lucene-1.4.3.tar.gz): http://www.apache.org/dyn/closer.cgi/jakarta/lucene/binaries/

Place the file in an appropriate directory of your linux account and untar-unzip it, e.g., with the following command:

tar -xvzf lucene-1.4.3.tar.gz

Next, you should update your CLASSPATH with the location of the relevant .jar files. If you had placed the file lucene-1.4.3.tar.gz in the directory /home/doucet/irm/, you should type the following command (one line):

export CLASSPATH=$CLASSPATH:/home/doucet/irm/
	lucene-1.4.3/lucene-1.4.3.jar:/home/doucet/irm/
	lucene-1.4.3/lucene-demos-1.4.3.jar

You can verify the modification of your CLASSPATH with the following command:

echo $CLASSPATH

Lucene is ready to use!


Index construction

  1. As an illustration, we can index a subpart of the documentation, say, the documents in /home/doucet/irm/lucene-1.4.3/docs/api/org/apache/lucene/search/. For the project work, each student will have to find at least 10 documents on the group's chosen topic and store them in a directory to be indexed.

  2. Assuming, the current directory is /home/doucet/irm/lucene-1.4.3/ This is done with the following command:

    java org.apache.lucene.demo.IndexFiles
    		 docs/api/org/apache/lucene/search/

    this creates a subdirectory named index in the installation directory (/home/doucet/irm/lucene-1.4.3/).

    There is further information about all this in the Lucene documentation: http://lucene.apache.org/java/docs/demo.html.


Queries

The query syntax is described here: http://lucene.apache.org/java/docs/queryparsersyntax.html.

To start querying the index, just type:

java org.apache.lucene.demo.SearchFiles
and you'll be prompted to type in a query.

For example:

doucet$ java org.apache.lucene.demo.SearchFiles
Query: weight
Searching for: weight
28 total matching documents
0. docs/api/org/apache/lucene/search/class-use/Weight.html
1. docs/api/org/apache/lucene/search/class-use/Searcher.html
2. docs/api/org/apache/lucene/search/Weight.html
3. docs/api/org/apache/lucene/search/FilteredQuery.html
4. docs/api/org/apache/lucene/search/Query.html
5. docs/api/org/apache/lucene/search/spans/SpanQuery.html
6. docs/api/org/apache/lucene/search/PhrasePrefixQuery.html
7. docs/api/org/apache/lucene/search/TermQuery.html
8. docs/api/org/apache/lucene/search/package-frame.html
9. docs/api/org/apache/lucene/search/BooleanQuery.html
more (y/n) ? y
10. docs/api/org/apache/lucene/search/PhraseQuery.html
11. docs/api/org/apache/lucene/search/WildcardQuery.html
12. docs/api/org/apache/lucene/search/class-use/Scorer.html
13. docs/api/org/apache/lucene/search/spans/SpanFirstQuery.html
14. docs/api/org/apache/lucene/search/spans/SpanNotQuery.html
15. docs/api/org/apache/lucene/search/spans/SpanOrQuery.html
16. docs/api/org/apache/lucene/search/spans/SpanTermQuery.html
17. docs/api/org/apache/lucene/search/MultiTermQuery.html
18. docs/api/org/apache/lucene/search/PrefixQuery.html
19. docs/api/org/apache/lucene/search/RangeQuery.html
more (y/n) ? n
Query: term AND weight
Searching for: +term +weight
14 total matching documents
0. docs/api/org/apache/lucene/search/PhrasePrefixQuery.html
1. docs/api/org/apache/lucene/search/class-use/Searcher.html
2. docs/api/org/apache/lucene/search/TermQuery.html
3. docs/api/org/apache/lucene/search/FuzzyQuery.html
4. docs/api/org/apache/lucene/search/PhraseQuery.html
5. docs/api/org/apache/lucene/search/Similarity.html
6. docs/api/org/apache/lucene/search/RangeQuery.html
7. docs/api/org/apache/lucene/search/WildcardQuery.html
8. docs/api/org/apache/lucene/search/MultiTermQuery.html
9. docs/api/org/apache/lucene/search/spans/SpanTermQuery.html
more (y/n) ?
    

Finally, a more detailed and technical documentation can be found here: http://jakarta.apache.org/lucene/docs/api/index.html.



Antoine Doucet
Last modified: Thu Jan 19 14:48:49 2006