reading module

This module contains classes that allow reading from an index.

Classes

class whoosh.reading.IndexReader

Do not instantiate this object directly. Instead use Index.reader().

all_stored_fields()
Yields the stored fields for all documents.
all_terms()
Yields (fieldname, text) tuples for every term in the index.
close()
Closes the open files associated with this reader.
doc_count()
Returns the total number of UNDELETED documents in this reader.
doc_count_all()
Returns the total number of documents, DELETED OR UNDELETED, in this reader.
doc_field_length(docnum, fieldid)
Returns the number of terms in the given field in the given document. This is used by some scoring algorithms.
doc_field_lengths(docnum)
Returns an array corresponding to the lengths of the scorable fields in the given document. It’s up to the caller to correlate the positions of the numbers in the array with the scorable fields in the schema.
doc_frequency(fieldid, text)
Returns how many documents the given term appears in.
expand_prefix(fieldid, prefix)
Yields terms in the given field that start with the given prefix.
field_length(fieldid)
Returns the total number of terms in the given field. This is used by some scoring algorithms.
format(fieldid)
Returns the Format object corresponding to the given field name.
frequency(fieldid, text)
Returns the total number of instances of the given term in the collection.
has_deletions()
Returns True if the underlying index/segment has deleted documents.
has_vector(docnum, fieldid)
Returns True if the given document has a term vector for the given field.
is_deleted(docnum)
Returns True if the given document number is marked deleted.
iter_field(fieldid, prefix='')
Yields (text, doc_freq, index_freq) tuples for all terms in the given field.
iter_from(fieldnum, text)
Yields (field_num, text, doc_freq, index_freq) tuples for all terms in the reader, starting at the given term.
iter_prefix(fieldid, prefix)
Yields (field_num, text, doc_freq, index_freq) tuples for all terms in the given field with a certain prefix.
lexicon(fieldid)
Yields all terms in the given field.
most_distinctive_terms(fieldid, number=5, prefix=None)
Returns the top ‘number’ terms with the highest tf*idf scores as a list of (score, text) tuples.
most_frequent_terms(fieldid, number=5, prefix='')
Returns the top ‘number’ most frequent terms in the given field as a list of (frequency, text) tuples.
postings(fieldid, text, exclude_docs=None)

Returns a PostingReader for the postings of the given term.

>>> pr = searcher.postings("content", "render")
>>> pr.skip_to(10)
>>> pr.id
12
Parameters:
  • fieldid – the field name or field number of the term.
  • text – the text of the term.
Exclude_docs:

an optional BitVector of documents to exclude from the results, or None to not exclude any documents.

Return type:

whoosh.postings.PostingReader

scorable(fieldid)
Returns true if the given field stores field lengths.
stored_fields(docnum)
Returns the stored fields for the given document number.
vector(docnum, fieldid)

Returns a PostingReader object for the given term vector.

>>> docnum = searcher.document_number(path=u'/a/b/c')
>>> v = searcher.vector(docnum, "content")
>>> v.all_as("frequency")
[(u"apple", 3), (u"bear", 2), (u"cab", 2)]
Parameters:
  • docnum – the document number of the document for which you want the term vector.
  • fieldid – the field name or field number of the field for which you want the term vector.
Return type:

whoosh.postings.PostingReader

vector_as(astype, docnum, fieldid)

Returns an iterator of (termtext, value) pairs for the terms in the given term vector. This is a convenient shortcut to calling vector() and using the PostingReader object when all you want are the terms and/or values.

>>> docnum = searcher.document_number(path=u'/a/b/c')
>>> searcher.vector_as("frequency", docnum, "content")
[(u"apple", 3), (u"bear", 2), (u"cab", 2)]
Parameters:
  • docnum – the document number of the document for which you want the term vector.
  • fieldid – the field name or field number of the field for which you want the term vector.
  • astype – a string containing the name of the format you want the term vector’s data in, for example “weights”.
class whoosh.reading.MultiReader(readers, doc_offsets, schema)
Do not instantiate this object directly. Instead use Index.reader().

Exceptions

exception whoosh.reading.TermNotFound

Table Of Contents

Previous topic

query module

Next topic

scoring module

This Page