Parsing user queries

Overview

The job of a query parser is to convert a query string submitted by a user into query objects (objects from the whoosh.query module) which

For example, the user query:

.. code-block:: none
rendering shading

might be parsed into query objects like this:

And([Term("content", u"rendering"), Term("content", u"shading")])

Whoosh includes a few pre-made parsers for user queries in the whoosh.qparser module. The default parser is based on pyparsing <http://pyparsing.wikispaces.com/> and implements a query language similar to the one shipped with Lucene. The parser is quite powerful and how it builds query trees is fairly customizable.

Using the default parser

To create a whoosh.qparser.QueryParser object, pass it the name of the default field to search and the schema of the index you’ll be searching.

from whoosh.qparser import QueryParser

parser = QueryParser(“content”, schema=myindex.schema)

Tip

You can instantiate a QueryParser object without specifying a schema, however the parser will not process the text of the user query (see querying and indexing below). This is really only useful for debugging, when you want to see how QueryParser will build a query, but don’t want to make up a schema just for testing.

Once you have a QueryParser object, you can call parse() on it to parse a query string into a query object:

>>> parser.parse(u"alpha OR beta gamma")
Or([Term("content", u"alpha"), Term("content", "beta")])

See the query language reference for the features and syntax of the default parser’s query language.

Letting the user search multiple fields

The QueryParser object takes terms without explicit fields and assigns them to the default field you specified when you created the object, so for example if you created the object with:

parser = QueryParser("content", schema=myschema)

And the user entered the query:

three blind mice

The parser would treat it as:

content:three content:blind content:mice

However, you might want to let the user search multiple fields by default. For example, you might want “unfielded” terms to search both the title and content fields.

In that case, you can use a whoosh.qparser.MultifieldParser. This is just like the normal QueryParser, but instead of a default field name string, it takes a sequence of field names:

from whoosh.qparser import MultifieldParser

mparser = MultifieldParser(["title", "content"], schema=myschema)

When this MultifieldParser instance parses three blind mice, it treats it as:

(title:three OR content:three) (title:blind OR content:blind) (title:mice OR content:mice)

The relationship between indexing and querying

TBD.

Customizing the parser

QueryParser arguments

QueryParser supports two extra keyword arguments:

conjunction

The query class to use to join sub-queries when the user doesn’t explicitly specify a boolean operator, such as AND or OR.

This must be a whoosh.query.Query subclass (not an instantiated object) that accepts a list of subqueries in its __init__ method. The default is whoosh.query.And.

This is useful if you want to change the default operator to OR, or if you’ve written a custom operator you want the parser to use instead of the ones shipped with Whoosh.

termclass

The query class to use to wrap single terms.

This must be a whoosh.query.Query subclass (not an instantiated object) that accepts a fieldname string and term text unicode string in its __init__ method. The default is whoosh.query.Term.

This is useful if you want to chnage the default term class to whoosh.query.Variations, or if you’ve written a custom term class you want the parser to use instead of the ones shipped with Whoosh.

>>> orparser = QueryParser("content", schema=myschema, conjunction=query.Or)

Subclassing QueryParser

The QueryParser class is designed to allow a certain amount of customization by subclassing. The methods invoked on the abstract syntax tree produced by pyparsing in turn call methods starting with make_, such as make_term, make_prefix, etc. The methods are passed the parsed information (such as the fieldname and term text for make_term) and return a Query object. You can subclass and replace these methods to do additional processing or return difference Query types. See the source code of the PyparsingBasedParser and QueryParser classes in the qparser module.

Writing your own parser

To implement a different query syntax, or for complete control over query parsing, you can write your own parser.

A parser is simply a class or function that takes input from the user and generates whoosh.query.Query objects from it. For example, you could write a function that parses queries specified in XML:

<and>
    <term field="content">first</term>
    <term field="content">second</term>
    <not>
        <term field="date">20070506</term>
    </not>
</and>