org.apache.lucene.index
Class IndexReader

java.lang.Object
  extended by org.apache.lucene.index.IndexReader
Direct Known Subclasses:
FilterIndexReader, MultiReader, ParallelReader

public abstract class IndexReader
extends java.lang.Object

IndexReader is an abstract class, providing an interface for accessing an index. Search of an index is done entirely through this abstract interface, so that any subclass which implements it is searchable.

Concrete subclasses of IndexReader are usually constructed with a call to one of the static open() methods, e.g. open(String).

For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral--they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions.

An IndexReader can be opened on a directory for which an IndexWriter is opened already, but it cannot be used to delete documents from the index then.

Version:
$Id: IndexReader.java 543620 2007-06-01 21:18:56Z dnaber $
Author:
Doug Cutting

Nested Class Summary
static class IndexReader.FieldOption
           
 
Constructor Summary
protected IndexReader(Directory directory)
          Constructor used if IndexReader is not owner of its directory.
 
Method Summary
 void close()
          Closes files associated with this index.
protected  void commit()
          Commit changes resulting from delete, undeleteAll, or setNorm operations If an exception is hit, then either no changes or all changes will have been committed to the index (transactional semantics).
 void deleteDocument(int docNum)
          Deletes the document numbered docNum.
 int deleteDocuments(Term term)
          Deletes all documents that have a given term indexed.
 Directory directory()
          Returns the directory this index resides in.
abstract  int docFreq(Term t)
          Returns the number of documents containing the term t.
protected abstract  void doClose()
          Implements close.
protected abstract  void doCommit()
          Implements commit.
 Document document(int n)
          Returns the stored fields of the nth Document in this index.
abstract  Document document(int n, FieldSelector fieldSelector)
          Get the Document at the nth position.
protected abstract  void doDelete(int docNum)
          Implements deletion of the document numbered docNum.
protected abstract  void doSetNorm(int doc, java.lang.String field, byte value)
          Implements setNorm in subclass.
protected abstract  void doUndeleteAll()
          Implements actual undeleteAll() in subclass.
protected  void ensureOpen()
           
protected  void finalize()
          Release the write lock, if needed.
static long getCurrentVersion(Directory directory)
          Reads version number from segments files.
static long getCurrentVersion(java.io.File directory)
          Reads version number from segments files.
static long getCurrentVersion(java.lang.String directory)
          Reads version number from segments files.
abstract  java.util.Collection getFieldNames(IndexReader.FieldOption fldOption)
          Get a list of unique field names that exist in this index and have the specified field option information.
abstract  TermFreqVector getTermFreqVector(int docNumber, java.lang.String field)
          Return a term frequency vector for the specified document and field.
abstract  TermFreqVector[] getTermFreqVectors(int docNumber)
          Return an array of term frequency vectors for the specified document.
 long getVersion()
          Version number when this IndexReader was opened.
abstract  boolean hasDeletions()
          Returns true if any documents have been deleted
 boolean hasNorms(java.lang.String field)
          Returns true if there are norms stored for this field.
static boolean indexExists(Directory directory)
          Returns true if an index exists at the specified directory.
static boolean indexExists(java.io.File directory)
          Returns true if an index exists at the specified directory.
static boolean indexExists(java.lang.String directory)
          Returns true if an index exists at the specified directory.
 boolean isCurrent()
          Check whether this IndexReader is still using the current (i.e., most recently committed) version of the index.
abstract  boolean isDeleted(int n)
          Returns true if document n has been deleted
static boolean isLocked(Directory directory)
          Returns true iff the index in the named directory is currently locked.
static boolean isLocked(java.lang.String directory)
          Returns true iff the index in the named directory is currently locked.
 boolean isOptimized()
          Checks is the index is optimized (if it has a single segment and no deletions)
static long lastModified(Directory directory2)
          Returns the time the index in the named directory was last modified.
static long lastModified(java.io.File fileDirectory)
          Returns the time the index in the named directory was last modified.
static long lastModified(java.lang.String directory)
          Returns the time the index in the named directory was last modified.
static void main(java.lang.String[] args)
          Prints the filename and size of each file within a given compound file.
abstract  int maxDoc()
          Returns one greater than the largest possible document number.
abstract  byte[] norms(java.lang.String field)
          Returns the byte-encoded normalization factor for the named field of every document.
abstract  void norms(java.lang.String field, byte[] bytes, int offset)
          Reads the byte-encoded normalization factor for the named field of every document.
abstract  int numDocs()
          Returns the number of documents in this index.
static IndexReader open(Directory directory)
          Returns an IndexReader reading the index in the given Directory.
static IndexReader open(Directory directory, IndexDeletionPolicy deletionPolicy)
          Expert: returns an IndexReader reading the index in the given Directory, with a custom IndexDeletionPolicy.
static IndexReader open(java.io.File path)
          Returns an IndexReader reading the index in an FSDirectory in the named path.
static IndexReader open(java.lang.String path)
          Returns an IndexReader reading the index in an FSDirectory in the named path.
 void setNorm(int doc, java.lang.String field, byte value)
          Expert: Resets the normalization factor for the named field of the named document.
 void setNorm(int doc, java.lang.String field, float value)
          Expert: Resets the normalization factor for the named field of the named document.
abstract  TermDocs termDocs()
          Returns an unpositioned TermDocs enumerator.
 TermDocs termDocs(Term term)
          Returns an enumeration of all the documents which contain term.
abstract  TermPositions termPositions()
          Returns an unpositioned TermPositions enumerator.
 TermPositions termPositions(Term term)
          Returns an enumeration of all the documents which contain term.
abstract  TermEnum terms()
          Returns an enumeration of all the terms in the index.
abstract  TermEnum terms(Term t)
          Returns an enumeration of all terms starting at a given term.
 void undeleteAll()
          Undeletes all documents currently marked as deleted in this index.
static void unlock(Directory directory)
          Forcibly unlocks the index in the named directory.
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

IndexReader

protected IndexReader(Directory directory)
Constructor used if IndexReader is not owner of its directory. This is used for IndexReaders that are used within other IndexReaders that take care or locking directories.

Parameters:
directory - Directory where IndexReader files reside.
Method Detail

ensureOpen

protected final void ensureOpen()
                         throws AlreadyClosedException
Throws:
AlreadyClosedException - if this IndexReader is closed

open

public static IndexReader open(java.lang.String path)
                        throws CorruptIndexException,
                               java.io.IOException
Returns an IndexReader reading the index in an FSDirectory in the named path.

Parameters:
path - the path to the index directory
Throws:
CorruptIndexException - if the index is corrupt
java.io.IOException - if there is a low-level IO error

open

public static IndexReader open(java.io.File path)
                        throws CorruptIndexException,
                               java.io.IOException
Returns an IndexReader reading the index in an FSDirectory in the named path.

Parameters:
path - the path to the index directory
Throws:
CorruptIndexException - if the index is corrupt
java.io.IOException - if there is a low-level IO error

open

public static IndexReader open(Directory directory)
                        throws CorruptIndexException,
                               java.io.IOException
Returns an IndexReader reading the index in the given Directory.

Parameters:
directory - the index directory
Throws:
CorruptIndexException - if the index is corrupt
java.io.IOException - if there is a low-level IO error

open

public static IndexReader open(Directory directory,
                               IndexDeletionPolicy deletionPolicy)
                        throws CorruptIndexException,
                               java.io.IOException
Expert: returns an IndexReader reading the index in the given Directory, with a custom IndexDeletionPolicy.

Parameters:
directory - the index directory
deletionPolicy - a custom deletion policy (only used if you use this reader to perform deletes or to set norms); see IndexWriter for details.
Throws:
CorruptIndexException - if the index is corrupt
java.io.IOException - if there is a low-level IO error

directory

public Directory directory()
Returns the directory this index resides in.


lastModified

public static long lastModified(java.lang.String directory)
                         throws CorruptIndexException,
                                java.io.IOException
Returns the time the index in the named directory was last modified. Do not use this to check whether the reader is still up-to-date, use isCurrent() instead.

Throws:
CorruptIndexException - if the index is corrupt
java.io.IOException - if there is a low-level IO error

lastModified

public static long lastModified(java.io.File fileDirectory)
                         throws CorruptIndexException,
                                java.io.IOException
Returns the time the index in the named directory was last modified. Do not use this to check whether the reader is still up-to-date, use isCurrent() instead.

Throws:
CorruptIndexException - if the index is corrupt
java.io.IOException - if there is a low-level IO error

lastModified

public static long lastModified(Directory directory2)
                         throws CorruptIndexException,
                                java.io.IOException
Returns the time the index in the named directory was last modified. Do not use this to check whether the reader is still up-to-date, use isCurrent() instead.

Throws:
CorruptIndexException - if the index is corrupt
java.io.IOException - if there is a low-level IO error

getCurrentVersion

public static long getCurrentVersion(java.lang.String directory)
                              throws CorruptIndexException,
                                     java.io.IOException
Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index.

Parameters:
directory - where the index resides.
Returns:
version number.
Throws:
CorruptIndexException - if the index is corrupt
java.io.IOException - if there is a low-level IO error

getCurrentVersion

public static long getCurrentVersion(java.io.File directory)
                              throws CorruptIndexException,
                                     java.io.IOException
Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index.

Parameters:
directory - where the index resides.
Returns:
version number.
Throws:
CorruptIndexException - if the index is corrupt
java.io.IOException - if there is a low-level IO error

getCurrentVersion

public static long getCurrentVersion(Directory directory)
                              throws CorruptIndexException,
                                     java.io.IOException
Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index.

Parameters:
directory - where the index resides.
Returns:
version number.
Throws:
CorruptIndexException - if the index is corrupt
java.io.IOException - if there is a low-level IO error

getVersion

public long getVersion()
Version number when this IndexReader was opened.


isCurrent

public boolean isCurrent()
                  throws CorruptIndexException,
                         java.io.IOException
Check whether this IndexReader is still using the current (i.e., most recently committed) version of the index. If a writer has committed any changes to the index since this reader was opened, this will return false, in which case you must open a new IndexReader in order to see the changes. See the description of the autoCommit flag which controls when the IndexWriter actually commits changes to the index.

Throws:
CorruptIndexException - if the index is corrupt
java.io.IOException - if there is a low-level IO error

isOptimized

public boolean isOptimized()
Checks is the index is optimized (if it has a single segment and no deletions)

Returns:
true if the index is optimized; false otherwise

getTermFreqVectors

public abstract TermFreqVector[] getTermFreqVectors(int docNumber)
                                             throws java.io.IOException
Return an array of term frequency vectors for the specified document. The array contains a vector for each vectorized field in the document. Each vector contains terms and frequencies for all terms in a given vectorized field. If no such fields existed, the method returns null. The term vectors that are returned my either be of type TermFreqVector or of type TermPositionsVector if positions or offsets have been stored.

Parameters:
docNumber - document for which term frequency vectors are returned
Returns:
array of term frequency vectors. May be null if no term vectors have been stored for the specified document.
Throws:
java.io.IOException - if index cannot be accessed
See Also:
Field.TermVector

getTermFreqVector

public abstract TermFreqVector getTermFreqVector(int docNumber,
                                                 java.lang.String field)
                                          throws java.io.IOException
Return a term frequency vector for the specified document and field. The returned vector contains terms and frequencies for the terms in the specified field of this document, if the field had the storeTermVector flag set. If termvectors had been stored with positions or offsets, a TermPositionsVector is returned.

Parameters:
docNumber - document for which the term frequency vector is returned
field - field for which the term frequency vector is returned.
Returns:
term frequency vector May be null if field does not exist in the specified document or term vector was not stored.
Throws:
java.io.IOException - if index cannot be accessed
See Also:
Field.TermVector

indexExists

public static boolean indexExists(java.lang.String directory)
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it. false is returned.

Parameters:
directory - the directory to check for an index
Returns:
true if an index exists; false otherwise

indexExists

public static boolean indexExists(java.io.File directory)
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it.

Parameters:
directory - the directory to check for an index
Returns:
true if an index exists; false otherwise

indexExists

public static boolean indexExists(Directory directory)
                           throws java.io.IOException
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it.

Parameters:
directory - the directory to check for an index
Returns:
true if an index exists; false otherwise
Throws:
java.io.IOException - if there is a problem with accessing the index

numDocs

public abstract int numDocs()
Returns the number of documents in this index.


maxDoc

public abstract int maxDoc()
Returns one greater than the largest possible document number. This may be used to, e.g., determine how big to allocate an array which will have an element for every document number in an index.


document

public Document document(int n)
                  throws CorruptIndexException,
                         java.io.IOException
Returns the stored fields of the nth Document in this index.

Throws:
CorruptIndexException - if the index is corrupt
java.io.IOException - if there is a low-level IO error

document

public abstract Document document(int n,
                                  FieldSelector fieldSelector)
                           throws CorruptIndexException,
                                  java.io.IOException
Get the Document at the nth position. The FieldSelector may be used to determine what Fields to load and how they should be loaded. NOTE: If this Reader (more specifically, the underlying FieldsReader) is closed before the lazy Field is loaded an exception may be thrown. If you want the value of a lazy Field to be available after closing you must explicitly load it or fetch the Document again with a new loader.

Parameters:
n - Get the document at the nth position
fieldSelector - The FieldSelector to use to determine what Fields should be loaded on the Document. May be null, in which case all Fields will be loaded.
Returns:
The stored fields of the Document at the nth position
Throws:
CorruptIndexException - if the index is corrupt
java.io.IOException - if there is a low-level IO error
See Also:
Fieldable, FieldSelector, SetBasedFieldSelector, LoadFirstFieldSelector

isDeleted

public abstract boolean isDeleted(int n)
Returns true if document n has been deleted


hasDeletions

public abstract boolean hasDeletions()
Returns true if any documents have been deleted


hasNorms

public boolean hasNorms(java.lang.String field)
                 throws java.io.IOException
Returns true if there are norms stored for this field.

Throws:
java.io.IOException

norms

public abstract byte[] norms(java.lang.String field)
                      throws java.io.IOException
Returns the byte-encoded normalization factor for the named field of every document. This is used by the search code to score documents.

Throws:
java.io.IOException
See Also:
AbstractField.setBoost(float)

norms

public abstract void norms(java.lang.String field,
                           byte[] bytes,
                           int offset)
                    throws java.io.IOException
Reads the byte-encoded normalization factor for the named field of every document. This is used by the search code to score documents.

Throws:
java.io.IOException
See Also:
AbstractField.setBoost(float)

setNorm

public final void setNorm(int doc,
                          java.lang.String field,
                          byte value)
                   throws StaleReaderException,
                          CorruptIndexException,
                          LockObtainFailedException,
                          java.io.IOException
Expert: Resets the normalization factor for the named field of the named document. The norm represents the product of the field's boost and its length normalization. Thus, to preserve the length normalization values when resetting this, one should base the new value upon the old.

Throws:
StaleReaderException - if the index has changed since this reader was opened
CorruptIndexException - if the index is corrupt
LockObtainFailedException - if another writer has this index open (write.lock could not be obtained)
java.io.IOException - if there is a low-level IO error
See Also:
norms(String), Similarity.decodeNorm(byte)

doSetNorm

protected abstract void doSetNorm(int doc,
                                  java.lang.String field,
                                  byte value)
                           throws CorruptIndexException,
                                  java.io.IOException
Implements setNorm in subclass.

Throws:
CorruptIndexException
java.io.IOException

setNorm

public void setNorm(int doc,
                    java.lang.String field,
                    float value)
             throws StaleReaderException,
                    CorruptIndexException,
                    LockObtainFailedException,
                    java.io.IOException
Expert: Resets the normalization factor for the named field of the named document.

Throws:
StaleReaderException - if the index has changed since this reader was opened
CorruptIndexException - if the index is corrupt
LockObtainFailedException - if another writer has this index open (write.lock could not be obtained)
java.io.IOException - if there is a low-level IO error
See Also:
norms(String), Similarity.decodeNorm(byte)

terms

public abstract TermEnum terms()
                        throws java.io.IOException
Returns an enumeration of all the terms in the index. The enumeration is ordered by Term.compareTo(). Each term is greater than all that precede it in the enumeration. Note that after calling terms(), TermEnum.next() must be called on the resulting enumeration before calling other methods such as TermEnum.term().

Throws:
java.io.IOException - if there is a low-level IO error

terms

public abstract TermEnum terms(Term t)
                        throws java.io.IOException
Returns an enumeration of all terms starting at a given term. If the given term does not exist, the enumeration is positioned at the first term greater than the supplied therm. The enumeration is ordered by Term.compareTo(). Each term is greater than all that precede it in the enumeration.

Throws:
java.io.IOException - if there is a low-level IO error

docFreq

public abstract int docFreq(Term t)
                     throws java.io.IOException
Returns the number of documents containing the term t.

Throws:
java.io.IOException - if there is a low-level IO error

termDocs

public TermDocs termDocs(Term term)
                  throws java.io.IOException
Returns an enumeration of all the documents which contain term. For each document, the document number, the frequency of the term in that document is also provided, for use in search scoring. Thus, this method implements the mapping:

The enumeration is ordered by document number. Each document number is greater than all that precede it in the enumeration.

Throws:
java.io.IOException - if there is a low-level IO error

termDocs

public abstract TermDocs termDocs()
                           throws java.io.IOException
Returns an unpositioned TermDocs enumerator.

Throws:
java.io.IOException - if there is a low-level IO error

termPositions

public TermPositions termPositions(Term term)
                            throws java.io.IOException
Returns an enumeration of all the documents which contain term. For each document, in addition to the document number and frequency of the term in that document, a list of all of the ordinal positions of the term in the document is available. Thus, this method implements the mapping:

This positional information faciliates phrase and proximity searching.

The enumeration is ordered by document number. Each document number is greater than all that precede it in the enumeration.

Throws:
java.io.IOException - if there is a low-level IO error

termPositions

public abstract TermPositions termPositions()
                                     throws java.io.IOException
Returns an unpositioned TermPositions enumerator.

Throws:
java.io.IOException - if there is a low-level IO error

deleteDocument

public final void deleteDocument(int docNum)
                          throws StaleReaderException,
                                 CorruptIndexException,
                                 LockObtainFailedException,
                                 java.io.IOException
Deletes the document numbered docNum. Once a document is deleted it will not appear in TermDocs or TermPostitions enumerations. Attempts to read its field with the document(int) method will result in an error. The presence of this document may still be reflected in the docFreq(org.apache.lucene.index.Term) statistic, though this will be corrected eventually as the index is further modified.

Throws:
StaleReaderException - if the index has changed since this reader was opened
CorruptIndexException - if the index is corrupt
LockObtainFailedException - if another writer has this index open (write.lock could not be obtained)
java.io.IOException - if there is a low-level IO error

doDelete

protected abstract void doDelete(int docNum)
                          throws CorruptIndexException,
                                 java.io.IOException
Implements deletion of the document numbered docNum. Applications should call deleteDocument(int) or deleteDocuments(Term).

Throws:
CorruptIndexException
java.io.IOException

deleteDocuments

public final int deleteDocuments(Term term)
                          throws StaleReaderException,
                                 CorruptIndexException,
                                 LockObtainFailedException,
                                 java.io.IOException
Deletes all documents that have a given term indexed. This is useful if one uses a document field to hold a unique ID string for the document. Then to delete such a document, one merely constructs a term with the appropriate field and the unique ID string as its text and passes it to this method. See deleteDocument(int) for information about when this deletion will become effective.

Returns:
the number of documents deleted
Throws:
StaleReaderException - if the index has changed since this reader was opened
CorruptIndexException - if the index is corrupt
LockObtainFailedException - if another writer has this index open (write.lock could not be obtained)
java.io.IOException - if there is a low-level IO error

undeleteAll

public final void undeleteAll()
                       throws StaleReaderException,
                              CorruptIndexException,
                              LockObtainFailedException,
                              java.io.IOException
Undeletes all documents currently marked as deleted in this index.

Throws:
StaleReaderException - if the index has changed since this reader was opened
LockObtainFailedException - if another writer has this index open (write.lock could not be obtained)
CorruptIndexException - if the index is corrupt
java.io.IOException - if there is a low-level IO error

doUndeleteAll

protected abstract void doUndeleteAll()
                               throws CorruptIndexException,
                                      java.io.IOException
Implements actual undeleteAll() in subclass.

Throws:
CorruptIndexException
java.io.IOException

commit

protected final void commit()
                     throws java.io.IOException
Commit changes resulting from delete, undeleteAll, or setNorm operations If an exception is hit, then either no changes or all changes will have been committed to the index (transactional semantics).

Throws:
java.io.IOException - if there is a low-level IO error

doCommit

protected abstract void doCommit()
                          throws java.io.IOException
Implements commit.

Throws:
java.io.IOException

close

public final void close()
                 throws java.io.IOException
Closes files associated with this index. Also saves any new deletions to disk. No other methods should be called after this has been called.

Throws:
java.io.IOException - if there is a low-level IO error

doClose

protected abstract void doClose()
                         throws java.io.IOException
Implements close.

Throws:
java.io.IOException

finalize

protected void finalize()
                 throws java.lang.Throwable
Release the write lock, if needed.

Overrides:
finalize in class java.lang.Object
Throws:
java.lang.Throwable

getFieldNames

public abstract java.util.Collection getFieldNames(IndexReader.FieldOption fldOption)
Get a list of unique field names that exist in this index and have the specified field option information.

Parameters:
fldOption - specifies which field option should be available for the returned fields
Returns:
Collection of Strings indicating the names of the fields.
See Also:
IndexReader.FieldOption

isLocked

public static boolean isLocked(Directory directory)
                        throws java.io.IOException
Returns true iff the index in the named directory is currently locked.

Parameters:
directory - the directory to check for a lock
Throws:
java.io.IOException - if there is a low-level IO error

isLocked

public static boolean isLocked(java.lang.String directory)
                        throws java.io.IOException
Returns true iff the index in the named directory is currently locked.

Parameters:
directory - the directory to check for a lock
Throws:
java.io.IOException - if there is a low-level IO error

unlock

public static void unlock(Directory directory)
                   throws java.io.IOException
Forcibly unlocks the index in the named directory.

Caution: this should only be used by failure recovery code, when it is known that no other process nor thread is in fact currently accessing this index.

Throws:
java.io.IOException

main

public static void main(java.lang.String[] args)
Prints the filename and size of each file within a given compound file. Add the -extract flag to extract files to the current working directory. In order to make the extracted version of the index work, you have to copy the segments file from the compound index into the directory where the extracted files are stored.

Parameters:
args - Usage: org.apache.lucene.index.IndexReader [-extract] <cfsfile>


Copyright © 2000-2008 Apache Software Foundation. All Rights Reserved.