query module

This module contains objects that query the search index. These query objects are composable to form complex query trees.

See also whoosh.qparser which contains code for parsing user queries into query objects.

Base classes

The following abstract base classes are subclassed to create the the “real” query operations.

class whoosh.query.Query

Abstract base class for all queries.

Note that this base class implements __or__, __and__, and __sub__ to allow slightly more convenient composition of query objects:

>>> Term("content", u"a") | Term("content", u"b")
Or([Term("content", u"a"), Term("content", u"b")])

>>> Term("content", u"a") & Term("content", u"b")
And([Term("content", u"a"), Term("content", u"b")])

>>> Term("content", u"a") - Term("content", u"b")
And([Term("content", u"a"), Not(Term("content", u"b"))])
accept(visitor)
Accepts a “visitor” function, applies it to any sub-queries and then to this object itself, and returns the result.
all_terms(termset=None, phrases=True)

Returns a set of all terms in this query tree.

This method simply operates on the query itself, without reference to an index (unlike existing_terms()), so it will not add terms that require an index to compute, such as Prefix and Wildcard.

>>> q = And([Term("content", u"render"), Term("path", u"/a/b")])
>>> q.all_terms()
set([("content", u"render"), ("path", u"/a/b")])
Parameter:phrases – Whether to add words found in Phrase queries.
Return type:set
doc_scores(searcher, exclude_docs=None)

Returns an iterator of (docnum, score) pairs matching this query. This is a convenience method for when you don’t need a QueryScorer (i.e. you don’t need to use skip_to).

>>> list(my_query.doc_scores(ixreader))
[(10, 0.73), (34, 2.54), (78, 0.05), (103, 12.84)]
Parameters:
docs(searcher, exclude_docs=None)

Returns an iterator of docnums matching this query.

>>> searcher = my_index.searcher()
>>> list(my_query.docs(searcher))
[10, 34, 78, 103]
Parameters:
estimate_size(ixreader)
Returns an estimate of how many documents this query could potentially match (for example, the estimated size of a simple term query is the document frequency of the term). It is permissible to overestimate, but not to underestimate.
existing_terms(ixreader, termset=None, reverse=False, phrases=True)

Returns a set of all terms in this query tree that exist in the index represented by the given ixreaderder.

This method references the IndexReader to expand Prefix and Wildcard queries, and only adds terms that actually exist in the index (unless reverse=True).

>>> ixreader = my_index.reader()
>>> q = And([Or([Term("content", u"render"),
...             Term("content", u"rendering")]),
...             Prefix("path", u"/a/")])
>>> q.existing_terms(ixreader, termset)
set([("content", u"render"), ("path", u"/a/b"), ("path", u"/a/c")])
Parameters:
  • ixreader – A whoosh.reading.IndexReader object.
  • reverse – If True, this method adds missing terms rather than existing terms to the set.
  • phrases – Whether to add words found in Phrase queries.
Return type:

set

normalize()

Returns a recursively “normalized” form of this query. The normalized form removes redundancy and empty queries. This is called automatically on query trees created by the query parser, but you may want to call it yourself if you’re writing your own parser or building your own queries.

>>> q = And([And([Term("f", u"a"),
...               Term("f", u"b")]),
...               Term("f", u"c"), Or([])])
>>> q.normalize()
And([Term("f", u"a"), Term("f", u"b"), Term("f", u"c")])

Note that this returns a new, normalized query. It does not modify the original query “in place”.

replace(oldtext, newtext)

Returns a copy of this query with oldtext replaced by newtext (if oldtext was anywhere in this query).

Note that this returns a new query with the given text replaced. It does not modify the original query “in place”.

scorer(searcher, exclude_docs=None)

Returns QueryScorer object you can use to retrieve documents and scores matching this query.

Return type:whoosh.postings.QueryScorer
simplify(ixreader)
Returns a recursively simplified form of this query, where “second-order” queries (such as Prefix and Variations) are re-written into lower-level queries (such as Term and Or).
class whoosh.query.CompoundQuery(subqueries, boost=1.0)
Abstract base class for queries that combine or manipulate the results of multiple sub-queries .
class whoosh.query.MultiTerm
Abstract base class for queries that operate on multiple terms in the same field.

Query classes

class whoosh.query.Term(fieldname, text, boost=1.0)

Matches documents containing the given term (fieldname+text pair).

>>> Term("content", u"render")
class whoosh.query.Variations(fieldname, text, boost=1.0)
Query that automatically searches for morphological variations of the given word in the same field.
class whoosh.query.FuzzyTerm(fieldname, text, boost=1.0, minsimilarity=0.5, prefixlength=1)

Matches documents containing words similar to the given term.

Parameters:
  • fieldname – The name of the field to search.
  • text – The text to search for.
  • boost – A boost factor to apply to scores of documents matching this query.
  • minsimilarity – The minimum similarity ratio to match. 1.0 is the maximum (an exact match to ‘text’).
  • prefixlength – The matched terms must share this many initial characters with ‘text’. For example, if text is “light” and prefixlength is 2, then only terms starting with “li” are checked for similarity.
class whoosh.query.Phrase(fieldname, words, slop=1, boost=1.0)

Matches documents containing a given phrase.

Parameters:
  • fieldname – the field to search.
  • words – a list of words (unicode strings) in the phrase.
  • slop – the number of words allowed between each “word” in the phrase; the default of 1 means the phrase must match exactly.
  • boost – a boost factor that to apply to the raw score of documents matched by this query.
class whoosh.query.And(subqueries, boost=1.0)

Matches documents that match ALL of the subqueries.

>>> And([Term("content", u"render"),
...      Term("content", u"shade"),
...      Not(Term("content", u"texture"))])
>>> # You can also do this
>>> Term("content", u"render") & Term("content", u"shade")
class whoosh.query.Or(subqueries, boost=1.0, minmatch=0)

Matches documents that match ANY of the subqueries.

>>> Or([Term("content", u"render"),
...     And([Term("content", u"shade"), Term("content", u"texture")]),
...     Not(Term("content", u"network"))])
>>> # You can also do this
>>> Term("content", u"render") | Term("content", u"shade")
class whoosh.query.DisjunctionMax(subqueries, boost=1.0, tiebreak=0.0)
Matches all documents that match any of the subqueries, but scores each document using the maximum score from the subqueries.
class whoosh.query.Not(query, boost=1.0)

Excludes any documents that match the subquery.

>>> # Match documents that contain 'render' but not 'texture'
>>> And([Term("content", u"render"),
...      Not(Term("content", u"texture"))])
>>> # You can also do this
>>> Term("content", u"render") - Term("content", u"texture")
Parameters:
  • query – A Query object. The results of this query are excluded from the parent query.
  • boost – Boost is meaningless for excluded documents but this keyword argument is accepted for the sake of a consistent interface.
class whoosh.query.Prefix(fieldname, text, boost=1.0)

Matches documents that contain any terms that start with the given text.

>>> # Match documents containing words starting with 'comp'
>>> Prefix("content", u"comp")
class whoosh.query.Wildcard(fieldname, text, boost=1.0)

Matches documents that contain any terms that match a wildcard expression.

>>> Wildcard("content", u"in*f?x")
Parameters:
  • fieldname – The field to search in.
  • text – A glob to search for. May contain ? and/or * wildcard characters. Note that matching a wildcard expression that starts with a wildcard is very inefficent, since the query must test every term in the field.
  • boost – A boost factor that should be applied to the raw score of results matched by this query.
class whoosh.query.TermRange(fieldname, start, end, startexcl=False, endexcl=False, boost=1.0)

Matches documents containing any terms in a given range.

>>> # Match documents where the indexed "id" field is greater than or equal
>>> # to 'apple' and less than or equal to 'pear'.
>>> TermRange("id", u"apple", u"pear")
Parameters:
  • fieldname – The name of the field to search.
  • start – Match terms equal to or greather than this.
  • end – Match terms equal to or less than this.
  • startexcl – If True, the range start is exclusive. If False, the range start is inclusive.
  • endexcl – If True, the range end is exclusive. If False, the range end is inclusive.
  • boost – Boost factor that should be applied to the raw score of results matched by this query.

Binary operations

These binary operators are not generally created by the query parser in whoosh.qparser. Unless you specifically need these operations, you should use the normal query classes instead.

class whoosh.query.Require(scoredquery, requiredquery, boost=1.0)

Binary query returns results from the first query that also appear in the second query, but only uses the scores from the first query. This lets you filter results without affecting scores.

Parameters:
  • scoredquery – The query that is scored. Only documents that also appear in the second query (‘requiredquery’) are scored.
  • requiredquery – Only documents that match both ‘scoredquery’ and ‘requiredquery’ are returned, but this query does not contribute to the scoring.
class whoosh.query.AndMaybe(requiredquery, optionalquery, boost=1.0)

Binary query takes results from the first query. If and only if the same document also appears in the results from the second query, the score from the second query will be added to the score from the first query.

Parameters:
  • requiredquery – Documents matching this query are returned.
  • optionalquery – If a document matches this query as well as ‘requiredquery’, the score from this query is added to the document score from ‘requiredquery’.
class whoosh.query.AndNot(positive, negative, boost=1.0)

Binary boolean query of the form ‘a ANDNOT b’, where documents that match b are removed from the matches for a.

Parameters:
  • positive – query to INCLUDE.
  • negative – query whose matches should be EXCLUDED.
  • boost – boost factor that should be applied to the raw score of results matched by this query.

Exceptions

exception whoosh.query.QueryError
Error encountered while running a query.

Table Of Contents

Previous topic

qparser module

Next topic

reading module

This Page