T - The type of records produced by this reader format.public abstract class AbstractOrcFileInputFormat<T,BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> extends Object implements org.apache.flink.connector.file.src.reader.BulkFormat<T,SplitT>
FileSource.
Implements the reader initialization, vectorized reading, and pooling of column vector objects.
Subclasses implement the conversion to the specific result record(s) that they return by
creating via extending AbstractOrcFileInputFormat.OrcReaderBatch.
| 限定符和类型 | 类和说明 |
|---|---|
protected static class |
AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>
The
OrcReaderBatch class holds the data structures containing the batch data (column
vectors, row arrays, ...) and performs the batch conversion from the ORC representation to
the result format. |
protected static class |
AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT>
A vectorized ORC reader.
|
| 限定符和类型 | 字段和说明 |
|---|---|
protected int |
batchSize |
protected List<OrcFilters.Predicate> |
conjunctPredicates |
protected SerializableHadoopConfigWrapper |
hadoopConfigWrapper |
protected org.apache.orc.TypeDescription |
schema |
protected int[] |
selectedFields |
protected OrcShim<BatchT> |
shim |
| 限定符 | 构造器和说明 |
|---|---|
protected |
AbstractOrcFileInputFormat(OrcShim<BatchT> shim,
org.apache.hadoop.conf.Configuration hadoopConfig,
org.apache.orc.TypeDescription schema,
int[] selectedFields,
List<OrcFilters.Predicate> conjunctPredicates,
int batchSize) |
| 限定符和类型 | 方法和说明 |
|---|---|
AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT> |
createReader(org.apache.flink.configuration.Configuration config,
SplitT split) |
abstract AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT> |
createReaderBatch(SplitT split,
OrcVectorizedBatchWrapper<BatchT> orcBatch,
org.apache.flink.connector.file.src.util.Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>> recycler,
int batchSize)
Creates the
AbstractOrcFileInputFormat.OrcReaderBatch structure, which is responsible for holding the data
structures that hold the batch data (column vectors, row arrays, ...) and the batch
conversion from the ORC representation to the result format. |
abstract org.apache.flink.api.common.typeinfo.TypeInformation<T> |
getProducedType()
Gets the type produced by this format.
|
boolean |
isSplittable() |
AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT> |
restoreReader(org.apache.flink.configuration.Configuration config,
SplitT split) |
protected final SerializableHadoopConfigWrapper hadoopConfigWrapper
protected final org.apache.orc.TypeDescription schema
protected final int[] selectedFields
protected final List<OrcFilters.Predicate> conjunctPredicates
protected final int batchSize
protected AbstractOrcFileInputFormat(OrcShim<BatchT> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize)
shim - the shim for various Orc dependent versions. If you use the latest version,
please use OrcShim.defaultShim() directly.hadoopConfig - the hadoop config for orc reader.schema - the full schema of orc format.selectedFields - the read selected field of orc format.conjunctPredicates - the filter predicates that can be evaluated.batchSize - the batch size of orc reader.public AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT> createReader(org.apache.flink.configuration.Configuration config, SplitT split) throws IOException
createReader 在接口中 org.apache.flink.connector.file.src.reader.BulkFormat<T,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>IOExceptionpublic AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT> restoreReader(org.apache.flink.configuration.Configuration config, SplitT split) throws IOException
restoreReader 在接口中 org.apache.flink.connector.file.src.reader.BulkFormat<T,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>IOExceptionpublic boolean isSplittable()
public abstract AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT> createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>> recycler, int batchSize)
AbstractOrcFileInputFormat.OrcReaderBatch structure, which is responsible for holding the data
structures that hold the batch data (column vectors, row arrays, ...) and the batch
conversion from the ORC representation to the result format.public abstract org.apache.flink.api.common.typeinfo.TypeInformation<T> getProducedType()
Copyright © 2014–2023 The Apache Software Foundation. All rights reserved.