spelling module

See how to use the Whoosh spell checker.

This module contains functions/classes using a Whoosh index as a backend for a spell-checking engine.

class whoosh.spelling.SpellChecker(storage, indexname='SPELL', booststart=2.0, boostend=1.0, mingram=3, maxgram=4, minscore=0.5)

Implements a spell-checking engine using a search index for the backend storage and lookup. This class is based on the Lucene contributed spell- checker code.

To use this object:

st = store.FileStorage("spelldict")
sp = SpellChecker(st)

sp.add_words([u"aardvark", u"manticore", u"zebra", ...])
# or
ix = index.open_dir("index")
sp.add_field(ix, "content")

suggestions = sp.suggest(u"ardvark", number = 2)
Parameters:
  • storage – The storage object in which to create the spell-checker’s dictionary index.
  • indexname – The name to use for the spell-checker’s dictionary index. You only need to change this if you have multiple spelling indexes in the same storage.
  • booststart – How much to boost matches of the first N-gram (the beginning of the word).
  • boostend – How much to boost matches of the last N-gram (the end of the word).
  • mingram – The minimum gram length to store.
  • maxgram – The maximum gram length to store.
  • minscore – The minimum score matches much achieve to be returned.
add_field(ix, fieldname)

Adds the terms in a field from another index to the backend dictionary. This method calls add_scored_words() and uses each term’s frequency as the score. As a result, more common words will be suggested before rare words. If you want to calculate the scores differently, use add_scored_words() directly.

Parameters:
  • ix – The index.Index object from which to add terms.
  • fieldname – The field name (or number) of a field in the source index. All the indexed terms from this field will be added to the dictionary.
add_scored_words(ws)

Adds a list of (“word”, score) tuples to the backend dictionary. Associating words with a score lets you use the ‘usescores’ keyword argument of the suggest() method to order the suggestions using the scores.

Parameters:
  • ws – A sequence of (“word”, score) tuples.
add_words(ws, score=1)

Adds a list of words to the backend dictionary.

Parameters:
  • ws – A sequence of words (strings) to add to the dictionary.
  • score – An optional score to use for ALL the words in ‘ws’.
index(create=False)

Returns the backend index of this object (instantiating it if it didn’t already exist).

suggest(text, number=3, usescores=False)

Returns a list of suggested alternative spellings of ‘text’. You must add words to the dictionary (using add_field, add_words, and/or add_scored_words) before you can use this.

Parameters:
  • text – The word to check.
  • number – The maximum number of suggestions to return.
  • usescores – Use the per-word score to influence the suggestions.
Return type:

list

suggestions_and_scores(text, weighting=None)

Returns a list of possible alternative spellings of ‘text’, as (‘word’, score, weight) triples, where ‘word’ is the suggested word, ‘score’ is the score that was assigned to the word using SpellChecker.add_field() or SpellChecker.add_scored_words(), and ‘weight’ is the score the word received in the search for the original word’s ngrams.

You must add words to the dictionary (using add_field, add_words, and/or add_scored_words) before you can use this.

This is a lower-level method, in case an expert user needs access to the raw scores, for example to implement a custom suggestion ranking algorithm. Most people will want to call suggest() instead, which simply returns the top N valued words.

Parameters:
  • text – The word to check.
Return type:

list

Previous topic

spans module

Next topic

store module

This Page