类 VectorIndexer

  • 所有已实现的接口:
    Serializable, org.apache.flink.ml.api.Estimator<VectorIndexer,​VectorIndexerModel>, org.apache.flink.ml.api.Stage<VectorIndexer>, org.apache.flink.ml.common.param.HasHandleInvalid<VectorIndexer>, org.apache.flink.ml.common.param.HasInputCol<VectorIndexer>, org.apache.flink.ml.common.param.HasOutputCol<VectorIndexer>, VectorIndexerModelParams<VectorIndexer>, VectorIndexerParams<VectorIndexer>, org.apache.flink.ml.param.WithParams<VectorIndexer>

    public class VectorIndexer
    extends Object
    implements org.apache.flink.ml.api.Estimator<VectorIndexer,​VectorIndexerModel>, VectorIndexerParams<VectorIndexer>
    An Estimator which implements the vector indexing algorithm.

    A vector indexer maps each column of the input vector into a continuous/categorical feature. Whether one feature is transformed into a continuous or categorical feature depends on the number of distinct values in this column. If the number of distinct values in one column is greater than a specified parameter (i.e., maxCategories), the corresponding output column is unchanged. Otherwise, it is transformed into a categorical value. For categorical outputs, the indices are in [0, numDistinctValuesInThisColumn].

    The output model is organized in ascending order except that 0.0 is always mapped to 0 (for sparsity). We list two examples here:

    • If one column contains {-1.0, 1.0}, then -1.0 should be encoded as 0 and 1.0 will be encoded as 1.
    • If one column contains {-1.0, 0.0, 1.0}, then -1.0 should be encoded as 1, 0.0 should be encoded as 0 and 1.0 should be encoded as 2.

    The `keep` option of HasHandleInvalid means that we put the invalid entries in a special bucket, whose index is the number of distinct values in this column.

    另请参阅:
    序列化表格
    • 字段概要

      • 从接口继承的字段 org.apache.flink.ml.common.param.HasHandleInvalid

        ERROR_INVALID, HANDLE_INVALID, KEEP_INVALID, SKIP_INVALID
      • 从接口继承的字段 org.apache.flink.ml.common.param.HasInputCol

        INPUT_COL
      • 从接口继承的字段 org.apache.flink.ml.common.param.HasOutputCol

        OUTPUT_COL