581257 Information Retrieval Methods (6 ECTS, 3 cu) Spring 2006
On Using Lucene
You will need to use Lucene for the project work of the course. You can find all the information and the source files on the Lucene project pages. I will below summarize everything you need to know about Lucene for the project work, in a hopefully self-sufficient manner.
Installation
You may download the binaries from the Lucene homepage (for linux, the appropriate compressed file is lucene-1.4.3.tar.gz): http://www.apache.org/dyn/closer.cgi/jakarta/lucene/binaries/
Place the file in an appropriate directory of your linux account and untar-unzip it, e.g., with the following command:
tar -xvzf lucene-1.4.3.tar.gz
Next, you should update your CLASSPATH with the location of the relevant .jar files. If you had placed the file lucene-1.4.3.tar.gz in the directory /home/doucet/irm/, you should type the following command (one line):
export CLASSPATH=$CLASSPATH:/home/doucet/irm/ lucene-1.4.3/lucene-1.4.3.jar:/home/doucet/irm/ lucene-1.4.3/lucene-demos-1.4.3.jar
You can verify the modification of your CLASSPATH with the following command:
echo $CLASSPATH
Lucene is ready to use!
Index construction
-
As an illustration, we can index a subpart of the documentation, say, the documents in /home/doucet/irm/lucene-1.4.3/docs/api/org/apache/lucene/search/. For the project work, each student will have to find at least 10 documents on the group's chosen topic and store them in a directory to be indexed.
-
Assuming, the current directory is /home/doucet/irm/lucene-1.4.3/ This is done with the following command:
java org.apache.lucene.demo.IndexFiles docs/api/org/apache/lucene/search/
this creates a subdirectory named index in the installation directory (/home/doucet/irm/lucene-1.4.3/).
There is further information about all this in the Lucene documentation: http://lucene.apache.org/java/docs/demo.html.
Queries
The query syntax is described here: http://lucene.apache.org/java/docs/queryparsersyntax.html.
To start querying the index, just type:
java org.apache.lucene.demo.SearchFilesand you'll be prompted to type in a query.
For example:
doucet$ java org.apache.lucene.demo.SearchFiles Query: weight Searching for: weight 28 total matching documents 0. docs/api/org/apache/lucene/search/class-use/Weight.html 1. docs/api/org/apache/lucene/search/class-use/Searcher.html 2. docs/api/org/apache/lucene/search/Weight.html 3. docs/api/org/apache/lucene/search/FilteredQuery.html 4. docs/api/org/apache/lucene/search/Query.html 5. docs/api/org/apache/lucene/search/spans/SpanQuery.html 6. docs/api/org/apache/lucene/search/PhrasePrefixQuery.html 7. docs/api/org/apache/lucene/search/TermQuery.html 8. docs/api/org/apache/lucene/search/package-frame.html 9. docs/api/org/apache/lucene/search/BooleanQuery.html more (y/n) ? y 10. docs/api/org/apache/lucene/search/PhraseQuery.html 11. docs/api/org/apache/lucene/search/WildcardQuery.html 12. docs/api/org/apache/lucene/search/class-use/Scorer.html 13. docs/api/org/apache/lucene/search/spans/SpanFirstQuery.html 14. docs/api/org/apache/lucene/search/spans/SpanNotQuery.html 15. docs/api/org/apache/lucene/search/spans/SpanOrQuery.html 16. docs/api/org/apache/lucene/search/spans/SpanTermQuery.html 17. docs/api/org/apache/lucene/search/MultiTermQuery.html 18. docs/api/org/apache/lucene/search/PrefixQuery.html 19. docs/api/org/apache/lucene/search/RangeQuery.html more (y/n) ? n Query: term AND weight Searching for: +term +weight 14 total matching documents 0. docs/api/org/apache/lucene/search/PhrasePrefixQuery.html 1. docs/api/org/apache/lucene/search/class-use/Searcher.html 2. docs/api/org/apache/lucene/search/TermQuery.html 3. docs/api/org/apache/lucene/search/FuzzyQuery.html 4. docs/api/org/apache/lucene/search/PhraseQuery.html 5. docs/api/org/apache/lucene/search/Similarity.html 6. docs/api/org/apache/lucene/search/RangeQuery.html 7. docs/api/org/apache/lucene/search/WildcardQuery.html 8. docs/api/org/apache/lucene/search/MultiTermQuery.html 9. docs/api/org/apache/lucene/search/spans/SpanTermQuery.html more (y/n) ?
Finally, a more detailed and technical documentation can be found here: http://jakarta.apache.org/lucene/docs/api/index.html.
Antoine Doucet Last modified: Thu Jan 19 14:48:49 2006