This module contains Query objects that deal with “spans”.
Span queries allow for positional constraints on matching documents. For example, the whoosh.spans.SpanNear query matches documents where one term occurs near another. Because you can nest span queries, and wrap them around almost any non-span query, you can create very complex constraints.
For example, to find documents containing “whoosh” at most 5 positions before “library” in the “text” field:
from whoosh import query, spans
t1 = query.Term("text", "whoosh")
t2 = query.Term("text", "library")
q = spans.SpanNear(t1, t2, slop=5)
Abstract base class for span-based queries. Each span query type wraps a “regular” query that implements the basic document-matching functionality (for example, SpanNear wraps an And query, because SpanNear requires that the two sub-queries occur in the same documents. The wrapped query is stored in the q attribute.
Subclasses usually only need to implement the initializer to set the wrapped query, and matcher() to return a span-aware matcher object.
Matches spans that end within the first N positions. This lets you for example only match terms near the beginning of the document.
Parameters: |
|
---|
Matches queries that occur near each other. By default, only matches queries that occur right next to each other (slop=1) and in order (ordered=True).
For example, to find documents where “whoosh” occurs next to “library” in the “text” field:
from whoosh import query, spans
t1 = query.Term("text", "whoosh")
t2 = query.Term("text", "library")
q = spans.SpanNear(t1, t2)
To find documents where “whoosh” occurs at most 5 positions before “library”:
q = spans.SpanNear(t1, t2, slop=5)
To find documents where “whoosh” occurs at most 5 positions before or after “library”:
q = spans.SpanNear(t1, t2, slop=5, ordered=False)
You can use the phrase() class method to create a tree of SpanNear queries to match a list of terms:
q = spans.SpanNear.phrase("text", [u"whoosh", u"search", u"library"], slop=2)
Parameters: |
|
---|---|
Pram mindist: | the minimum distance allowed between the queries. |
Matches spans from the first query only if they don’t overlap with spans from the second query. If there are no non-overlapping spans, the document does not match.
For example, to match documents that contain “bear” at most 2 places after “apple” in the “text” field but don’t have “cute” between them:
from whoosh import query, spans
t1 = query.Term("text", "apple")
t2 = query.Term("text", "bear")
near = spans.SpanNear(t1, t2, slop=2)
q = spans.SpanNot(near, query.Term("text", "cute"))
Parameters: |
|
---|
Matches documents that match any of a list of sub-queries. Unlike query.Or, this class merges together matching spans from the different sub-queries when they overlap.
Parameters: |
|
---|
Matches documents where the spans of the first query contain any spans of the second query.
For example, to match documents where “apple” occurs at most 10 places before “bear” in the “text” field and “cute” is between them:
from whoosh import query, spans
t1 = query.Term("text", "apple")
t2 = query.Term("text", "bear")
near = spans.SpanNear(t1, t2, slop=10)
q = spans.SpanContains(near, query.Term("text", "cute"))
Parameters: |
|
---|
Matches documents where the spans of the first query occur before any spans of the second query.
For example, to match documents where “apple” occurs anywhere before “bear”:
from whoosh import query, spans
t1 = query.Term("text", "apple")
t2 = query.Term("text", "bear")
q = spans.SpanBefore(t1, t2)
Parameters: |
|
---|
Matches documents that satisfy both subqueries, but only uses the spans from the first subquery.
This is useful when you want to place conditions on matches but not have those conditions affect the spans returned.
For example, to get spans for the term alfa in documents that also must contain the term bravo:
SpanCondition(Term("text", u"alfa"), Term("text", u"bravo"))