util module

Miscellaneous utility functions and classes.

class whoosh.util.ClosableMixin

Mix-in for classes with a close() method to allow them to be used as a context manager.

whoosh.util.byte_to_float(b, mantissabits=5, zeroexp=2)

Decodes a floating point number stored in a single byte.

whoosh.util.decode_signed_varint(i)

Zig-zag decodes an integer value.

whoosh.util.fib(n)

Returns the nth value in the Fibonacci sequence.

whoosh.util.find_object(name, blacklist=None, whitelist=None)

Imports and returns an object given a fully qualified name.

>>> find_object("whoosh.analysis.StopFilter")
<class 'whoosh.analysis.StopFilter'>
whoosh.util.first_diff(a, b)

Returns the position of the first differing character in the strings a and b. For example, first_diff(‘render’, ‘rending’) == 4. This function limits the return value to 255 so the difference can be encoded in a single byte.

whoosh.util.float_to_byte(value, mantissabits=5, zeroexp=2)

Encodes a floating point number in a single byte.

whoosh.util.length_to_byte(length)

Returns a logarithmic approximation of the given number, in the range 0-255. The approximation has high precision at the low end (e.g. 1 -> 0, 2 -> 1, 3 -> 2 ...) and low precision at the high end. Numbers equal to or greater than 108116 all approximate to 255.

This is useful for storing field lengths, where the general case is small documents and very large documents are more rare.

whoosh.util.lru_cache(maxsize=100)

Least-recently-used cache decorator.

This function duplicates (more-or-less) the protocol of the functools.lru_cache decorator in the Python 3.2 standard library, but uses the clock face LRU algorithm instead of an ordered dictionary.

If maxsize is set to None, the LRU features are disabled and the cache can grow without bound.

Arguments to the cached function must be hashable.

View the cache statistics named tuple (hits, misses, maxsize, currsize) with f.cache_info(). Clear the cache and statistics with f.cache_clear(). Access the underlying function with f.__wrapped__.

whoosh.util.make_binary_tree(fn, args, **kwargs)

Takes a function/class that takes two positional arguments and a list of arguments and returns a binary tree of instances.

>>> make_binary_tree(UnionMatcher, [matcher1, matcher2, matcher3])
UnionMatcher(matcher1, UnionMatcher(matcher2, matcher3))

Any keyword arguments given to this function are passed to the class initializer.

whoosh.util.natural_key(s)

Converts string s into a tuple that will sort “naturally” (i.e., name5 will come before name10 and 1 will come before A). This function is designed to be used as the key argument to sorting functions.

Parameters:
  • s – the str/unicode string to convert.
Return type:

tuple

whoosh.util.prefix_decode_all(ls)

Decompresses a list of strings compressed by prefix_encode().

whoosh.util.prefix_encode(a, b)

Compresses string b as an integer (encoded in a byte) representing the prefix it shares with a, followed by the suffix encoded as UTF-8.

whoosh.util.prefix_encode_all(ls)

Compresses the given list of (unicode) strings by storing each string (except the first one) as an integer (encoded in a byte) representing the prefix it shares with its predecessor, followed by the suffix encoded as UTF-8.

whoosh.util.protected(func)

Decorator for storage-access methods. This decorator (a) checks if the object has already been closed, and (b) synchronizes on a threading lock. The parent object must have ‘is_closed’ and ‘_sync_lock’ attributes.

whoosh.util.read_varint(readfn)

Reads a variable-length encoded integer.

Parameters:
  • readfn – a callable that reads a given number of bytes, like file.read().
whoosh.util.signed_varint(i)

Zig-zag encodes a signed integer into a varint.

whoosh.util.synchronized(func)

Decorator for storage-access methods, which synchronizes on a threading lock. The parent object must have ‘is_closed’ and ‘_sync_lock’ attributes.

whoosh.util.unbound_cache(func)

Caching decorator with an unbounded cache size.

whoosh.util.varint(i)

Encodes the given integer into a string of the minimum number of bytes.

Previous topic

support.levenshtein module

Next topic

writing module

This Page