Contains functions and classes related to fields.
Represents the collection of fields in an index. Maps field names to FieldType objects which define the behavior of each field.
Low-level parts of the index use field numbers instead of field names for compactness. This class has several methods for converting between the field name, field number, and field object itself.
All keyword arguments to the constructor are treated as fieldname = fieldtype pairs. The fieldtype can be an instantiated FieldType object, or a FieldType sub-class (in which case the Schema will instantiate it with the default constructor before adding it).
For example:
s = Schema(content = TEXT,
title = TEXT(stored = True),
tags = KEYWORD(stored = True))
Adds a field to this schema.
Parameters: |
|
---|
Returns the content analyzer for the given fieldname, or None if the field has no analyzer
Returns a shallow copy of the schema. The field instances are not deep copied, so they are shared between schema copies.
Returns True if any of the fields in this schema store term vectors.
Returns a list of (“fieldname”, field_object) pairs for the fields in this schema.
Returns a list of the names of the fields in this schema.
Returns a list of the names of fields that store field lengths.
Returns a list of the names of fields that are stored.
Returns a list of the names of fields that store vectors.
All keyword arguments to the constructor are treated as fieldname = fieldtype pairs. The fieldtype can be an instantiated FieldType object, or a FieldType sub-class (in which case the Schema will instantiate it with the default constructor before adding it).
For example:
s = Schema(content = TEXT,
title = TEXT(stored = True),
tags = KEYWORD(stored = True))
Represents a field configuration.
The FieldType object supports the following attributes:
The constructor for the base field type simply lets you supply your own configured field format, vector format, and scorable and stored values. Subclasses may configure some or all of this for you.
Clears any cached information in the field and any child objects.
Returns an iterator of (termtext, frequency, weight, encoded_value) tuples.
When self_parsing() returns True, the query parser will call this method to parse basic query text.
When self_parsing() returns True, the query parser will call this method to parse range query text. If this method returns None instead of a query object, the parser will fall back to parsing the start and end terms using process_text().
Returns an iterator of token strings corresponding to the given string.
Subclasses should override this method to return True if they want the query parser to call the field’s parse_query() method instead of running the analyzer on text in this field. This is useful where the field needs full control over how queries are interpreted, such as in the numeric field type.
alias of unicode
Returns an iterator of (term_text, sortable_value) pairs for the terms in the given reader and field. The sortable values can be used for sorting. The default implementation simply returns the texts of all terms in the field.
The value of the field’s sortable_type attribute should contain the type of the second item (the sortable value) in the pairs, e.g. unicode or int.
This can be overridden by field types such as NUMERIC where some values in a field are not useful for sorting, and where the sortable values can be expressed more compactly as numbers.
Returns a textual representation of the value. Non-textual fields (such as NUMERIC and DATETIME) will override this to encode objects as text.
Configured field type that indexes the entire value of the field as one token. This is useful for data you don’t want to tokenize, such as the path of a file.
Parameters: |
|
---|
Configured field type for fields containing IDs separated by whitespace and/or puntuation.
Parameters: |
|
---|
Configured field type for fields you want to store but not index.
Configured field type for fields containing space-separated or comma-separated keyword-like data (such as tags). The default is to not store positional information (so phrase searching is not allowed in this field) and to not make the field scorable.
Parameters: |
|
---|
Configured field type for text fields (for example, the body text of an article). The default is to store positional information to allow phrase searching. This field type is always scorable.
Parameters: |
|
---|
Special field type that lets you index int, long, or floating point numbers in relatively short fixed-width terms. The field converts numbers to sortable text for you before indexing.
You specify the numeric type of the field when you create the NUMERIC object. The default is int.
>>> schema = Schema(path=STORED, position=NUMERIC(long))
>>> ix = storage.create_index(schema)
>>> w = ix.writer()
>>> w.add_document(path="/a", position=5820402204)
>>> w.commit()
You can also use the NUMERIC field to store Decimal instances by specifying a type of int or long and the decimal_places keyword argument. This simply multiplies each number by (10 ** decimal_places) before storing it as an integer. Of course this may throw away decimal prcesision (by truncating, not rounding) and imposes the same maximum value limits as int/long, but these may be acceptable for certain applications.
>>> from decimal import Decimal
>>> schema = Schema(path=STORED, position=NUMERIC(int, decimal_places=4))
>>> ix = storage.create_index(schema)
>>> w = ix.writer()
>>> w.add_document(path="/a", position=Decimal("123.45")
>>> w.commit()
Parameters: |
|
---|
Special field type that lets you index datetime objects. The field converts the datetime objects to sortable text for you before indexing.
Since this field is based on Python’s datetime module it shares all the limitations of that module, such as the inability to represent dates before year 1 in the proleptic Gregorian calendar. However, since this field stores datetimes as an integer number of microseconds, it could easily represent a much wider range of dates if the Python datetime implementation ever supports them.
>>> schema = Schema(path=STORED, date=DATETIME)
>>> ix = storage.create_index(schema)
>>> w = ix.writer()
>>> w.add_document(path="/a", date=datetime.now())
>>> w.commit()
Parameters: |
|
---|
Special field type that lets you index boolean values (True and False). The field converts the boolean values to text for you before indexing.
>>> schema = Schema(path=STORED, done=BOOLEAN)
>>> ix = storage.create_index(schema)
>>> w = ix.writer()
>>> w.add_document(path="/a", done=False)
>>> w.commit()
Parameters: |
|
---|
Configured field that indexes text as N-grams. For example, with a field type NGRAM(3,4), the value “hello” will be indexed as tokens “hel”, “hell”, “ell”, “ello”, “llo”. This field chops the entire
Parameters: |
|
---|
Configured field that breaks text into words, lowercases, and then chops the words into N-grams.
Parameters: |
|
---|