|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use Tokenizer | |
---|---|
org.apache.lucene.analysis | API and code to convert text into indexable/searchable tokens. |
org.apache.lucene.analysis.cjk | Analyzer for Chinese, Japanese and Korean. |
org.apache.lucene.analysis.cn | Analyzer for Chinese. |
org.apache.lucene.analysis.ngram | |
org.apache.lucene.analysis.ru | Analyzer for Russian. |
org.apache.lucene.analysis.standard | A grammar-based tokenizer constructed with JavaCC. |
Uses of Tokenizer in org.apache.lucene.analysis |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis | |
---|---|
class |
CharTokenizer
An abstract base class for simple, character-oriented tokenizers. |
class |
KeywordTokenizer
Emits the entire input as a single token. |
class |
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters. |
class |
LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. |
class |
WhitespaceTokenizer
A WhitespaceTokenizer is a tokenizer that divides text at whitespace. |
Uses of Tokenizer in org.apache.lucene.analysis.cjk |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.cjk | |
---|---|
class |
CJKTokenizer
CJKTokenizer was modified from StopTokenizer which does a decent job for most European languages. |
Uses of Tokenizer in org.apache.lucene.analysis.cn |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.cn | |
---|---|
class |
ChineseTokenizer
Title: ChineseTokenizer Description: Extract tokens from the Stream using Character.getType() Rule: A Chinese character as a single token Copyright: Copyright (c) 2001 Company: The difference between thr ChineseTokenizer and the CJKTokenizer (id=23545) is that they have different token parsing logic. |
Uses of Tokenizer in org.apache.lucene.analysis.ngram |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.ngram | |
---|---|
class |
EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s). |
class |
NGramTokenizer
Tokenizes the input into n-grams of the given size(s). |
Uses of Tokenizer in org.apache.lucene.analysis.ru |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.ru | |
---|---|
class |
RussianLetterTokenizer
A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters in a given "russian charset". |
Uses of Tokenizer in org.apache.lucene.analysis.standard |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.standard | |
---|---|
class |
StandardTokenizer
A grammar-based tokenizer constructed with JavaCC. |
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |