formats module

The classes in this module encode and decode posting information for a field. The field format essentially determines what information is stored about each occurance of a term.

Base class

class whoosh.formats.Format(analyzer, field_boost=1.0, **options)

Abstract base class representing a storage format for a field or vector. Format objects are responsible for writing and reading the low-level representation of a field. It controls what kind/level of information to store about the indexed fields.

Parameters:
  • analyzer – The analysis.Analyzer object to use to index this field. See the analysis module for more information. If this value is None, the field is not indexed/searchable.
  • field_boost – A constant boost factor to scale to the score of all queries matching terms in this field.
analyze(unicodestring, mode='', **kwargs)

Returns a whoosh.analysis.Token iterator from the given unicode string.

Parameters:
  • unicodestring – the string to analyzer.
  • mode – a string indicating the purpose for which the unicode string is being analyzed, i.e. ‘index’ or ‘query’.
decode_as(astype, valuestring)
Interprets the encoded value string as ‘astype’, where ‘astype’ is for example “frequency” or “positions”. This object must have a corresponding decode_<astype>() method.
decoder(name)
Returns the bound method for interpreting value as ‘name’, where ‘name’ is for example “frequency” or “positions”. This object must have a corresponding Format.decode_<name>() method.
encode(value)
Returns the given value encoded as a string.
supports(name)
Returns True if this format supports interpreting its posting value as ‘name’ (e.g. “frequency” or “positions”).
word_values(value, **kwargs)

Takes the text value to be indexed and yields a series of (“tokentext”, frequency, valuestring) tuples, where frequency is the number of times “tokentext” appeared in the value, and valuestring is encoded field-specific posting value for the token. For example, in a Frequency format, the value string would be the same as frequency; in a Positions format, the value string would encode a list of token positions at which “tokentext” occured.

Parameter:value – The unicode text to index.

Formats

class whoosh.formats.Existence(analyzer, field_boost=1.0, **options)

Only indexes whether a given term occurred in a given document; it does not store frequencies or positions. This is useful for fields that should be searchable but not scorable, such as file path.

Supports: frequency, weight (always reports frequency = 1).

class whoosh.formats.Frequency(analyzer, field_boost=1.0, boost_as_freq=False, **options)

Stores frequency information for each posting.

Supports: frequency, weight.

Parameters:
  • analyzer – The analysis.Analyzer object to use to index this field. See the analysis module for more information. If this value is None, the field is not indexed/searchable.
  • field_boost – A constant boost factor to scale to the score of all queries matching terms in this field.
  • boost_as_freq – if True, take the integer value of each token’s boost attribute and use it as the token’s frequency.
class whoosh.formats.DocBoosts(analyzer, field_boost=1.0, boost_as_freq=False, **options)

A Field that stores frequency and per-document boost information for each posting.

Supports: frequency, weight.

Parameters:
  • analyzer – The analysis.Analyzer object to use to index this field. See the analysis module for more information. If this value is None, the field is not indexed/searchable.
  • field_boost – A constant boost factor to scale to the score of all queries matching terms in this field.
  • boost_as_freq – if True, take the integer value of each token’s boost attribute and use it as the token’s frequency.
class whoosh.formats.Positions(analyzer, field_boost=1.0, **options)

A vector that stores position information in each posting, to allow phrase searching and “near” queries.

Supports: frequency, weight, positions, position_boosts (always reports position boost = 1.0).

Parameters:
  • analyzer – The analysis.Analyzer object to use to index this field. See the analysis module for more information. If this value is None, the field is not indexed/searchable.
  • field_boost – A constant boost factor to scale to the score of all queries matching terms in this field.
class whoosh.formats.Characters(analyzer, field_boost=1.0, **options)

Stores token position and character start and end information for each posting.

Supports: frequency, weight, positions, position_boosts (always reports position boost = 1.0), characters.

Parameters:
  • analyzer – The analysis.Analyzer object to use to index this field. See the analysis module for more information. If this value is None, the field is not indexed/searchable.
  • field_boost – A constant boost factor to scale to the score of all queries matching terms in this field.
class whoosh.formats.PositionBoosts(analyzer, field_boost=1.0, **options)

A format that stores positions and per-position boost information in each posting.

Supports: frequency, weight, positions, position_boosts.

Parameters:
  • analyzer – The analysis.Analyzer object to use to index this field. See the analysis module for more information. If this value is None, the field is not indexed/searchable.
  • field_boost – A constant boost factor to scale to the score of all queries matching terms in this field.
class whoosh.formats.CharacterBoosts(analyzer, field_boost=1.0, **options)

A format that stores positions, character start and end, and per-position boost information in each posting.

Supports: frequency, weight, positions, position_boosts, characters, character_boosts.

Parameters:
  • analyzer – The analysis.Analyzer object to use to index this field. See the analysis module for more information. If this value is None, the field is not indexed/searchable.
  • field_boost – A constant boost factor to scale to the score of all queries matching terms in this field.

Table Of Contents

Previous topic

fields module

Next topic

highlight module

This Page