|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.solr.analysis.BaseTokenizerFactory
org.apache.solr.analysis.PatternTokenizerFactory
public class PatternTokenizerFactory
This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".
group=-1 (the default) is equivalent to "split". In this case, the tokens will
be equivalent to the output from (without empty tokens):
String.split(java.lang.String)
Using group >= 0 selects the matching group as the token. For example, if you have:
pattern = \'([^\']+)\' group = 0 input = aaa 'bbb' 'ccc'the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)
NOTE: This Tokenizer does not output tokens that are of zero length.
PatternTokenizer| Field Summary | |
|---|---|
protected java.util.Map<java.lang.String,java.lang.String> |
args
|
protected int |
group
|
static java.lang.String |
GROUP
|
protected org.apache.lucene.util.Version |
luceneMatchVersion
the luceneVersion arg |
protected java.util.regex.Pattern |
pattern
|
static java.lang.String |
PATTERN
|
| Fields inherited from class org.apache.solr.analysis.BaseTokenizerFactory |
|---|
log |
| Constructor Summary | |
|---|---|
PatternTokenizerFactory()
|
|
| Method Summary | |
|---|---|
protected void |
assureMatchVersion()
this method can be called in the #create method,
to inform user, that for this factory a luceneMatchVersion is required |
org.apache.lucene.analysis.Tokenizer |
create(java.io.Reader in)
Split the input using configured pattern |
java.util.Map<java.lang.String,java.lang.String> |
getArgs()
|
protected boolean |
getBoolean(java.lang.String name,
boolean defaultVal)
|
protected boolean |
getBoolean(java.lang.String name,
boolean defaultVal,
boolean useDefault)
|
protected int |
getInt(java.lang.String name)
|
protected int |
getInt(java.lang.String name,
int defaultVal)
|
protected int |
getInt(java.lang.String name,
int defaultVal,
boolean useDefault)
|
protected org.apache.lucene.analysis.CharArraySet |
getWordSet(ResourceLoader loader,
java.lang.String wordFiles,
boolean ignoreCase)
|
static java.util.List<org.apache.lucene.analysis.Token> |
group(java.util.regex.Matcher matcher,
java.lang.String input,
int group)
Deprecated. |
void |
init(java.util.Map<java.lang.String,java.lang.String> args)
Require a configured pattern |
static java.util.List<org.apache.lucene.analysis.Token> |
split(java.util.regex.Matcher matcher,
java.lang.String input)
Deprecated. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface org.apache.solr.analysis.TokenizerFactory |
|---|
getArgs |
| Field Detail |
|---|
public static final java.lang.String PATTERN
public static final java.lang.String GROUP
protected java.util.Map<java.lang.String,java.lang.String> args
protected java.util.regex.Pattern pattern
protected int group
protected org.apache.lucene.util.Version luceneMatchVersion
| Constructor Detail |
|---|
public PatternTokenizerFactory()
| Method Detail |
|---|
public void init(java.util.Map<java.lang.String,java.lang.String> args)
init in interface TokenizerFactorypublic org.apache.lucene.analysis.Tokenizer create(java.io.Reader in)
@Deprecated
public static java.util.List<org.apache.lucene.analysis.Token> split(java.util.regex.Matcher matcher,
java.lang.String input)
@Deprecated
public static java.util.List<org.apache.lucene.analysis.Token> group(java.util.regex.Matcher matcher,
java.lang.String input,
int group)
public java.util.Map<java.lang.String,java.lang.String> getArgs()
protected final void assureMatchVersion()
#create method,
to inform user, that for this factory a luceneMatchVersion is required
protected int getInt(java.lang.String name)
protected int getInt(java.lang.String name,
int defaultVal)
protected int getInt(java.lang.String name,
int defaultVal,
boolean useDefault)
protected boolean getBoolean(java.lang.String name,
boolean defaultVal)
protected boolean getBoolean(java.lang.String name,
boolean defaultVal,
boolean useDefault)
protected org.apache.lucene.analysis.CharArraySet getWordSet(ResourceLoader loader,
java.lang.String wordFiles,
boolean ignoreCase)
throws java.io.IOException
java.io.IOException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||