类 UnivariateFeatureSelector
- java.lang.Object
-
- org.apache.flink.ml.feature.univariatefeatureselector.UnivariateFeatureSelector
-
- 所有已实现的接口:
Serializable,org.apache.flink.ml.api.Estimator<UnivariateFeatureSelector,UnivariateFeatureSelectorModel>,org.apache.flink.ml.api.Stage<UnivariateFeatureSelector>,org.apache.flink.ml.common.param.HasFeaturesCol<UnivariateFeatureSelector>,org.apache.flink.ml.common.param.HasLabelCol<UnivariateFeatureSelector>,org.apache.flink.ml.common.param.HasOutputCol<UnivariateFeatureSelector>,UnivariateFeatureSelectorModelParams<UnivariateFeatureSelector>,UnivariateFeatureSelectorParams<UnivariateFeatureSelector>,org.apache.flink.ml.param.WithParams<UnivariateFeatureSelector>
public class UnivariateFeatureSelector extends Object implements org.apache.flink.ml.api.Estimator<UnivariateFeatureSelector,UnivariateFeatureSelectorModel>, UnivariateFeatureSelectorParams<UnivariateFeatureSelector>
An Estimator which selects features based on univariate statistical tests against labels.Currently, Flink supports three Univariate Feature Selectors: chi-squared, ANOVA F-test and F-value. User can choose Univariate Feature Selector by setting `featureType` and `labelType`, and Flink will pick the score function based on the specified `featureType` and `labelType`.
The following combination of `featureType` and `labelType` are supported:
- `featureType` `categorical` and `labelType` `categorical`: Flink uses chi-squared, i.e. chi2 in sklearn.
- `featureType` `continuous` and `labelType` `categorical`: Flink uses ANOVA F-test, i.e. f_classif in sklearn.
- `featureType` `continuous` and `labelType` `continuous`: Flink uses F-value, i.e. f_regression in sklearn.
The `UnivariateFeatureSelector` supports different selection modes:
- numTopFeatures: chooses a fixed number of top features according to a hypothesis.
- percentile: similar to numTopFeatures but chooses a fraction of all features instead of a fixed number.
- fpr: chooses all features whose p-value are below a threshold, thus controlling the false positive rate of selection.
- fdr: uses the Benjamini-Hochberg procedure to choose all features whose false discovery rate is below a threshold.
- fwe: chooses all features whose p-values are below a threshold. The threshold is scaled by 1/numFeatures, thus controlling the family-wise error rate of selection.
By default, the selection mode is `numTopFeatures`.
- 另请参阅:
- 序列化表格
-
-
字段概要
-
从接口继承的字段 org.apache.flink.ml.feature.univariatefeatureselector.UnivariateFeatureSelectorParams
CATEGORICAL, CONTINUOUS, FDR, FEATURE_TYPE, FPR, FWE, LABEL_TYPE, NUM_TOP_FEATURES, PERCENTILE, SELECTION_MODE, SELECTION_THRESHOLD
-
-
构造器概要
构造器 构造器 说明 UnivariateFeatureSelector()
-
方法概要
所有方法 静态方法 实例方法 具体方法 修饰符和类型 方法 说明 UnivariateFeatureSelectorModelfit(org.apache.flink.table.api.Table... inputs)Map<org.apache.flink.ml.param.Param<?>,Object>getParamMap()static UnivariateFeatureSelectorload(org.apache.flink.table.api.bridge.java.StreamTableEnvironment tEnv, String path)voidsave(String path)-
从类继承的方法 java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
从接口继承的方法 org.apache.flink.ml.feature.univariatefeatureselector.UnivariateFeatureSelectorParams
getFeatureType, getLabelType, getSelectionMode, getSelectionThreshold, setFeatureType, setLabelType, setSelectionMode, setSelectionThreshold
-
-
-
-
方法详细资料
-
fit
public UnivariateFeatureSelectorModel fit(org.apache.flink.table.api.Table... inputs)
- 指定者:
fit在接口中org.apache.flink.ml.api.Estimator<UnivariateFeatureSelector,UnivariateFeatureSelectorModel>
-
getParamMap
public Map<org.apache.flink.ml.param.Param<?>,Object> getParamMap()
- 指定者:
getParamMap在接口中org.apache.flink.ml.param.WithParams<UnivariateFeatureSelector>
-
save
public void save(String path) throws IOException
- 指定者:
save在接口中org.apache.flink.ml.api.Stage<UnivariateFeatureSelector>- 抛出:
IOException
-
load
public static UnivariateFeatureSelector load(org.apache.flink.table.api.bridge.java.StreamTableEnvironment tEnv, String path) throws IOException
- 抛出:
IOException
-
-