public interface ITokenizer
TT_ prefix, e.g. TT_TERM.TF_SEPARATOR_SENTENCE).TokenTypeUtils| Modifier and Type | Field and Description |
|---|---|
static short |
TF_COMMON_WORD
The current token is a common word.
|
static short |
TF_QUERY_WORD
The current token is part of the query.
|
static short |
TF_SEPARATOR_DOCUMENT
Current token is a document separator (never returned from parsing).
|
static short |
TF_SEPARATOR_FIELD
Current token separates document's logical fields.
|
static short |
TF_SEPARATOR_SENTENCE
Current token is a sentence separator.
|
static short |
TF_TERMINATOR
Current token terminates the input (never returned from parsing).
|
static int |
TT_ACRONYM |
static int |
TT_BARE_URL |
static int |
TT_EMAIL |
static int |
TT_EOF
Indicates the end of the token stream.
|
static int |
TT_FILE |
static int |
TT_FULL_URL |
static int |
TT_HYPHTERM |
static int |
TT_NUMERIC |
static int |
TT_PUNCTUATION |
static int |
TT_TERM |
static int |
TYPE_MASK |
| Modifier and Type | Method and Description |
|---|---|
short |
nextToken()
Returns the next token from the input stream.
|
void |
reset(Reader reader)
Resets the tokenizer to process new data
|
void |
setTermBuffer(MutableCharArray array)
Sets the current token image to the provided buffer.
|
static final int TYPE_MASK
static final int TT_TERM
static final int TT_NUMERIC
static final int TT_PUNCTUATION
static final int TT_EMAIL
static final int TT_ACRONYM
static final int TT_FULL_URL
static final int TT_BARE_URL
static final int TT_FILE
static final int TT_HYPHTERM
static final int TT_EOF
static final short TF_SEPARATOR_SENTENCE
static final short TF_SEPARATOR_DOCUMENT
static final short TF_SEPARATOR_FIELD
static final short TF_TERMINATOR
static final short TF_COMMON_WORD
static final short TF_QUERY_WORD
void reset(Reader reader) throws IOException
reader - the input to tokenize. The reader will not be closed
by the tokenizer when the end of stream is reached.IOExceptionshort nextToken()
throws IOException
TT_TERM and other
constants or TT_EOF when the end of the data stream has been
reached.IOExceptionTokenTypeUtilsvoid setTermBuffer(MutableCharArray array)
array - buffer in which the current token image should be
stored