qparser module

Parser object

class whoosh.qparser.QueryParser(fieldname, schema, termclass=<class 'whoosh.query.Term'>, phraseclass=<class 'whoosh.query.Phrase'>, group=<class 'whoosh.qparser.syntax.AndGroup'>, plugins=None)

A hand-written query parser built on modular plug-ins. The default configuration implements a powerful fielded query language similar to Lucene’s.

You can use the plugins argument when creating the object to override the default list of plug-ins, and/or use add_plugin() and/or remove_plugin_class() to change the plug-ins included in the parser.

>>> from whoosh import qparser
>>> parser = qparser.QueryParser("content", schema)
>>> parser.remove_plugin_class(qparser.WildcardPlugin)
>>> parser.parse(u"hello there")
And([Term("content", u"hello"), Term("content", u"there")])
Parameters:
  • fieldname – the default field – use this as the field for any terms without an explicit field.
  • schema – a whoosh.fields.Schema object to use when parsing. The appropriate fields in the schema will be used to tokenize terms/phrases before they are turned into query objects. You can specify None for the schema to create a parser that does not analyze the text of the query, usually for testing purposes.
  • termclass – the query class to use for individual search terms. The default is whoosh.query.Term.
  • phraseclass – the query class to use for phrases. The default is whoosh.query.Phrase.
  • group – the default grouping. AndGroup makes terms required by default. OrGroup makes terms optional by default.
  • plugins – a list of plugins to use. WhitespacePlugin is automatically included, do not put it in this list. This overrides the default list of plugins. Classes in the list will be automatically instantiated.
add_plugin(plugin)

Adds the given plugin to the list of plugins in this parser.

filters()

Returns a priorized list of filter functions from the included plugins.

parse(text, normalize=True, debug=False)

Parses the input string and returns a Query object/tree.

This method may return None if the input string does not result in any valid queries.

Parameters:
  • text – the unicode string to parse.
  • normalize – whether to call normalize() on the query object/tree before returning it. This should be left on unless you’re trying to debug the parser output.
Return type:

whoosh.query.Query

remove_plugin(plugin)

Removes the given plugin from the list of plugins in this parser.

remove_plugin_class(cls)

Removes any plugins of the given class from this parser.

replace_plugin(plugin)

Removes any plugins of the class of the given plugin and then adds it. This is a convenience method to keep from having to call remove_plugin_class followed by add_plugin each time you want to reconfigure a default plugin.

>>> qp = qparser.QueryParser("content", schema)
>>> qp.replace_plugin(qparser.NotPlugin("(^| )-"))
term_query(fieldname, text, termclass, boost=1.0, tokenize=True, removestops=True)

Returns the appropriate query object for a single term in the query string.

tokens()

Returns a priorized list of tokens from the included plugins.

Pre-made configurations

The following functions return pre-configured QueryParser objects.

whoosh.qparser.MultifieldParser(fieldnames, schema, fieldboosts=None, **kwargs)

Returns a QueryParser configured to search in multiple fields.

Instead of assigning unfielded clauses to a default field, this parser transforms them into an OR clause that searches a list of fields. For example, if the list of multi-fields is “f1”, “f2” and the query string is “hello there”, the class will parse “(f1:hello OR f2:hello) (f1:there OR f2:there)”. This is very useful when you have two textual fields (e.g. “title” and “content”) you want to search by default.

Parameters:
  • fieldnames – a list of field names to search.
  • fieldboosts – an optional dictionary mapping field names to boosts.
whoosh.qparser.SimpleParser(fieldname, schema, **kwargs)

Returns a QueryParser configured to support only +, -, and phrase syntax.

whoosh.qparser.DisMaxParser(fieldboosts, schema, tiebreak=0.0, **kwargs)

Returns a QueryParser configured to support only +, -, and phrase syntax, and which converts individual terms into DisjunctionMax queries across a set of fields.

Parameters:
  • fieldboosts – a dictionary mapping field names to boosts.

Plug-ins

class whoosh.qparser.FieldsPlugin(remove_unknown=True)

Adds the ability to specify the field of a clause using a colon.

This plugin is included in the default parser configuration.

whoosh.qparser.CompoundsPlugin

alias of OperatorsPlugin

class whoosh.qparser.NotPlugin(token='(^|(?<= ))NOT ')

This plugin is deprecated, its functionality is now provided by the OperatorsPlugin.

class whoosh.qparser.WildcardPlugin

Adds the ability to specify wildcard queries by using asterisk and question mark characters in terms. Note that these types can be very performance and memory intensive. You may consider not including this type of query.

This plugin is included in the default parser configuration.

class whoosh.qparser.PrefixPlugin

Adds the ability to specify prefix queries by ending a term with an asterisk. This plugin is useful if you want the user to be able to create prefix but not wildcard queries (for performance reasons). If you are including the wildcard plugin, you should not include this plugin as well.

class whoosh.qparser.PhrasePlugin

Adds the ability to specify phrase queries inside double quotes.

This plugin has no configuration.

This plugin is included in the default parser configuration.

class whoosh.qparser.RangePlugin

Adds the ability to specify term ranges.

This plugin has no configuration.

This plugin is included in the default parser configuration.

class whoosh.qparser.SingleQuotesPlugin

Adds the ability to specify single “terms” containing spaces by enclosing them in single quotes.

This plugin has no configuration.

This plugin is included in the default parser configuration.

class whoosh.qparser.GroupPlugin

Adds the ability to group clauses using parentheses.

This plugin is included in the default parser configuration.

class whoosh.qparser.BoostPlugin

Adds the ability to boost clauses of the query using the circumflex.

This plugin is included in the default parser configuration.

class whoosh.qparser.NotPlugin(token='(^|(?<= ))NOT ')

This plugin is deprecated, its functionality is now provided by the OperatorsPlugin.

class whoosh.qparser.PlusMinusPlugin

Adds the ability to use + and - in a flat OR query to specify required and prohibited terms.

This is the basis for the parser configuration returned by SimpleParser().

class whoosh.qparser.MultifieldPlugin(fieldnames, fieldboosts=None)

Converts any unfielded terms into OR clauses that search for the term in a specified list of fields.

Parameters:
  • fieldnames – a list of fields to search.
  • fieldboosts – an optional dictionary mapping field names to a boost to use for that field.
class whoosh.qparser.DisMaxPlugin(fieldboosts, tiebreak=0.0)

Converts any unfielded terms into DisjunctionMax clauses that search for the term in a specified list of fields.

Parameters:
  • fieldboosts – a dictionary mapping field names to a boost to use for that in the DisjuctionMax query.
class whoosh.qparser.FieldAliasPlugin(fieldmap)

Adds the ability to use “aliases” of fields in the query string.

>>> # Allow users to use 'body' or 'text' to refer to the 'content' field
>>> parser.add_plugin(FieldAliasPlugin({"content": ["body", "text"]}))
>>> parser.parse("text:hello")
Term("content", "hello")
Parameters:
  • fieldmap – a dictionary mapping fieldnames to a list of aliases for the field.
class whoosh.qparser.CopyFieldPlugin(map, group=<class 'whoosh.qparser.syntax.OrGroup'>, mirror=False)

Looks for basic syntax tokens (terms, prefixes, wildcards, phrases, etc.) occurring in a certain field and replaces it with a group (by default OR) containing the original token and the token copied to a new field.

For example, the query:

hello name:matt

could be automatically converted by CopyFieldPlugin({"name", "author"}) to:

hello (name:matt OR author:matt)

This is useful where one field was indexed with a differently-analyzed copy of another, and you want the query to search both fields.

Parameters:
  • map – a dictionary mapping names of fields to copy to the names of the destination fields.
  • group – the type of group to create in place of the original token.
  • two_way – if True, the plugin copies both ways, so if the user specifies a query in the ‘toname’ field, it will be copied to the ‘fromname’ field.
class whoosh.qparser.GtLtPlugin(expr='(?P<rel>(<=|>=|<|>|=<|=>))')

Allows the user to use greater than/less than symbols to create range queries:

a:>100 b:<=z c:>=-1.4 d:<mz

This is the equivalent of:

a:{100 to] b:[to z] c:[-1.4 to] d:[to mz}

The plugin recognizes >, <, >=, <=, =>, and =< after a field specifier. The field specifier is required. You cannot do the following:

>100

This plugin requires the FieldsPlugin and RangePlugin to work.

Parameters:
  • expr – a regular expression that must capture a “rel” group (which contains <, >, >=, <=, =>, or =<)

Syntax objects

Groups

class whoosh.qparser.SyntaxObject

An object representing parsed text. These objects generally correspond to a query object type, and are intermediate objects used to represent the syntax tree parsed from a query string, and then generate a query tree from the syntax tree. There will be syntax objects that do not have a corresponding query type, such as the syntax object representing whitespace.

class whoosh.qparser.Group(tokens=None, boost=1.0)

Represents a group of syntax objects. These generally correspond to compound query objects such as query.And and query.Or.

class whoosh.qparser.AndGroup(tokens=None, boost=1.0)

Syntax group corresponding to an And query.

class whoosh.qparser.OrGroup(tokens=None, boost=1.0)

Syntax group corresponding to an Or query.

class whoosh.qparser.AndNotGroup(tokens=None, boost=1.0)

Syntax group corresponding to an AndNot query.

class whoosh.qparser.AndMaybeGroup(tokens=None, boost=1.0)

Syntax group corresponding to an AndMaybe query.

class whoosh.qparser.DisMaxGroup(tokens=None, tiebreak=0.0, boost=None)

Syntax group corresponding to a DisjunctionMax query.

class whoosh.qparser.NotGroup(tokens=None, boost=1.0)

Syntax group corresponding to a Not query.

Tokens

class whoosh.qparser.Token

A parse-able token object. Each token class has an expr attribute containing a regular expression that matches the token text. When this expression is found, the class/object’s create() method is called and returns a token object to represent the match in the token stream.

Many token classes will do the parsing using class methods and put instances of themselves in the token stream, however parseable objects requiring configuration (such as the Operator subclasses may use separate objects for doing the parsing and embodying the token.

class whoosh.qparser.Singleton

Base class for tokens that don’t carry any information specific to each instance (e.g. “open parenthesis” token), so they can all share the same instance.

class whoosh.qparser.White
class whoosh.qparser.BasicSyntax(text, fieldname=None, boost=1.0)

Base class for “basic” (atomic) syntax – term, prefix, wildcard, phrase, range.

class whoosh.qparser.Word(text, fieldname=None, boost=1.0)

Syntax object representing a term.

Table Of Contents

Previous topic

postings module

Next topic

query module

This Page