public class OrcColumnarRowInputFormat<BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> extends AbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,BatchT,SplitT> implements org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
ColumnarRowData records.
This class can add extra fields through ColumnBatchFactory, for example, add partition
fields, which can be extracted from path. Therefore, the getProducedType() may be
different and types of extra fields need to be added.
AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>, AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT>batchSize, conjunctPredicates, hadoopConfigWrapper, schema, selectedFields, shim| 构造器和说明 |
|---|
OrcColumnarRowInputFormat(OrcShim<BatchT> shim,
org.apache.hadoop.conf.Configuration hadoopConfig,
org.apache.orc.TypeDescription schema,
int[] selectedFields,
List<OrcFilters.Predicate> conjunctPredicates,
int batchSize,
ColumnBatchFactory<BatchT,SplitT> batchFactory,
org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo) |
| 限定符和类型 | 方法和说明 |
|---|---|
static <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> |
createPartitionedFormat(OrcShim<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch> shim,
org.apache.hadoop.conf.Configuration hadoopConfig,
org.apache.flink.table.types.logical.RowType tableType,
List<String> partitionKeys,
org.apache.flink.connector.file.table.PartitionFieldExtractor<SplitT> extractor,
int[] selectedFields,
List<OrcFilters.Predicate> conjunctPredicates,
int batchSize,
java.util.function.Function<org.apache.flink.table.types.logical.RowType,org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData>> rowTypeInfoFactory)
Create a partitioned
OrcColumnarRowInputFormat, the partition columns can be
generated by split. |
AbstractOrcFileInputFormat.OrcReaderBatch<org.apache.flink.table.data.RowData,BatchT> |
createReaderBatch(SplitT split,
OrcVectorizedBatchWrapper<BatchT> orcBatch,
org.apache.flink.connector.file.src.util.Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<org.apache.flink.table.data.RowData,BatchT>> recycler,
int batchSize)
Creates the
AbstractOrcFileInputFormat.OrcReaderBatch structure, which is responsible for holding the data
structures that hold the batch data (column vectors, row arrays, ...) and the batch
conversion from the ORC representation to the result format. |
org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> |
getProducedType()
Gets the type produced by this format.
|
org.apache.flink.table.plan.stats.TableStats |
reportStatistics(List<org.apache.flink.core.fs.Path> files,
org.apache.flink.table.types.DataType producedDataType) |
createReader, isSplittable, restoreReaderpublic OrcColumnarRowInputFormat(OrcShim<BatchT> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, ColumnBatchFactory<BatchT,SplitT> batchFactory, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo)
public AbstractOrcFileInputFormat.OrcReaderBatch<org.apache.flink.table.data.RowData,BatchT> createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<org.apache.flink.table.data.RowData,BatchT>> recycler, int batchSize)
AbstractOrcFileInputFormatAbstractOrcFileInputFormat.OrcReaderBatch structure, which is responsible for holding the data
structures that hold the batch data (column vectors, row arrays, ...) and the batch
conversion from the ORC representation to the result format.createReaderBatch 在类中 AbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>public org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> getProducedType()
AbstractOrcFileInputFormatgetProducedType 在接口中 org.apache.flink.api.java.typeutils.ResultTypeQueryable<org.apache.flink.table.data.RowData>getProducedType 在接口中 org.apache.flink.connector.file.src.reader.BulkFormat<org.apache.flink.table.data.RowData,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>getProducedType 在类中 AbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>public org.apache.flink.table.plan.stats.TableStats reportStatistics(List<org.apache.flink.core.fs.Path> files, org.apache.flink.table.types.DataType producedDataType)
reportStatistics 在接口中 org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormatpublic static <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> OrcColumnarRowInputFormat<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,SplitT> createPartitionedFormat(OrcShim<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType tableType, List<String> partitionKeys, org.apache.flink.connector.file.table.PartitionFieldExtractor<SplitT> extractor, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, java.util.function.Function<org.apache.flink.table.types.logical.RowType,org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData>> rowTypeInfoFactory)
OrcColumnarRowInputFormat, the partition columns can be
generated by split.Copyright © 2014–2023 The Apache Software Foundation. All rights reserved.