@Bindable(prefix="STCClusteringAlgorithm",
inherit=CommonAttributes.class)
@Label(value="STC Clustering")
public final class STCClusteringAlgorithm
extends ProcessingComponentBase
implements IClusteringAlgorithm
| Modifier and Type | Field and Description |
|---|---|
List<Cluster> |
clusters
Clusters created by the algorithm.
|
double |
documentCountBoost
Document count boost.
|
List<Document> |
documents
Documents to cluster.
|
int |
ignoreWordIfInFewerDocs
Minimum word-document recurrences.
|
double |
ignoreWordIfInHigherDocsPercent
Maximum word-document ratio.
|
int |
maxBaseClusters
Maximum base clusters count.
|
int |
maxClusters
Maximum final clusters.
|
int |
maxDescPhraseLength
Maximum words per label.
|
double |
maxPhraseOverlap
Maximum cluster phrase overlap.
|
int |
maxPhrases
Maximum phrases per label.
|
boolean |
mergeStemEquivalentBaseClusters
Merge all stem-equivalent base clusters before running the merge phase.
|
double |
mergeThreshold
Base cluster merge threshold.
|
double |
minBaseClusterScore
Minimum base cluster score.
|
int |
minBaseClusterSize
Minimum documents per base cluster.
|
double |
mostGeneralPhraseCoverage
Minimum general phrase coverage.
|
MultilingualClustering |
multilingualClustering
A helper for performing multilingual clustering.
|
int |
optimalPhraseLength
Optimal label length.
|
double |
optimalPhraseLengthDev
Phrase length tolerance.
|
IPreprocessingPipeline |
preprocessingPipeline
Common preprocessing tasks handler.
|
String |
query
Query that produced the documents.
|
double |
scoreWeight
Balance between cluster score and size during cluster sorting.
|
double |
singleTermBoost
Single term boost.
|
| Constructor and Description |
|---|
STCClusteringAlgorithm() |
| Modifier and Type | Method and Description |
|---|---|
void |
afterProcessing()
Memory cleanups.
|
void |
process()
Performs STC clustering of
documents. |
beforeProcessing, dispose, getContext, getSharedExecutor, initclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitbeforeProcessing, dispose, init@Processing @Input @Internal @Attribute(key="query", inherit=true) public String query
@Processing @Input @Required @Internal @Attribute(key="documents", inherit=true) public List<Document> documents
@Processing @Output @Internal @Attribute(key="clusters", inherit=true) public List<Cluster> clusters
@Processing @Input @Attribute @IntRange(min=2) @Level(value=MEDIUM) @Group(value="Word filtering") public int ignoreWordIfInFewerDocs
@Processing @Input @Attribute @Level(value=MEDIUM) @Group(value="Word filtering") public double ignoreWordIfInHigherDocsPercent
@Processing @Input @Attribute @Level(value=ADVANCED) @Group(value="Base clusters") public double minBaseClusterScore
@Processing @Input @Attribute @IntRange(min=2) @Level(value=ADVANCED) @Group(value="Base clusters") public int maxBaseClusters
@Processing @Input @Attribute @IntRange(min=2, max=20) @Level(value=ADVANCED) @Group(value="Base clusters") public int minBaseClusterSize
@Processing @Input @Attribute @IntRange(min=1) @Level(value=BASIC) @Group(value="Merging and output") public int maxClusters
@Processing @Input @Attribute @Level(value=ADVANCED) @Group(value="Merging and output") public double mergeThreshold
@Processing @Input @Attribute @Level(value=ADVANCED) @Group(value="Labels") public double maxPhraseOverlap
@Processing @Input @Attribute @Level(value=ADVANCED) @Group(value="Labels") public double mostGeneralPhraseCoverage
@Processing @Input @Attribute @IntRange(min=1) @Level(value=BASIC) @Group(value="Labels") public int maxDescPhraseLength
@Processing @Input @Attribute @IntRange(min=1) @Level(value=BASIC) @Group(value="Labels") public int maxPhrases
@Processing @Input @Attribute @Level(value=MEDIUM) @Group(value="Base clusters") public double singleTermBoost
@Processing @Input @Attribute @IntRange(min=1) @Level(value=BASIC) @Group(value="Base clusters") public int optimalPhraseLength
@Processing @Input @Attribute @Level(value=MEDIUM) @Group(value="Base clusters") public double optimalPhraseLengthDev
@Processing @Input @Attribute @Level(value=MEDIUM) @Group(value="Base clusters") public double documentCountBoost
@Input @Attribute @Internal @Level(value=ADVANCED) public IPreprocessingPipeline preprocessingPipeline
@Input @Processing @Attribute @Label(value="Size-Score sorting ratio") @Level(value=MEDIUM) @Group(value="Clusters") public double scoreWeight
@Input @Processing @Attribute @Label(value="Merge all stem-equivalent phrases when discovering base clusters") @Level(value=MEDIUM) @Group(value="Clusters") public boolean mergeStemEquivalentBaseClusters
public final MultilingualClustering multilingualClustering
public void process()
throws ProcessingException
documents.process in interface IProcessingComponentprocess in class ProcessingComponentBaseProcessingException - when processing failed. If thrown, the
IProcessingComponent.afterProcessing() method will be called and the component will
be ready to accept further requests or to be disposed of. Finally, the
exception will be rethrown from the controller method that caused the
component to perform processing.public void afterProcessing()
afterProcessing in interface IProcessingComponentafterProcessing in class ProcessingComponentBase