T - The type of records produced by this reader format.public abstract class AbstractOrcFileInputFormat<T,BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> extends Object implements org.apache.flink.connector.file.src.reader.BulkFormat<T,SplitT>
FileSource.
Implements the reader initialization, vectorized reading, and pooling of column vector objects.
Subclasses implement the conversion to the specific result record(s) that they return by
creating via extending AbstractOrcFileInputFormat.OrcReaderBatch.
| Modifier and Type | Class and Description |
|---|---|
protected static class |
AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>
The
OrcReaderBatch class holds the data structures containing the batch data (column
vectors, row arrays, ...) and performs the batch conversion from the ORC representation to
the result format. |
protected static class |
AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT>
A vectorized ORC reader.
|
| Modifier and Type | Field and Description |
|---|---|
protected int |
batchSize |
protected List<OrcFilters.Predicate> |
conjunctPredicates |
protected SerializableHadoopConfigWrapper |
hadoopConfigWrapper |
protected org.apache.orc.TypeDescription |
schema |
protected int[] |
selectedFields |
protected OrcShim<BatchT> |
shim |
| Modifier | Constructor and Description |
|---|---|
protected |
AbstractOrcFileInputFormat(OrcShim<BatchT> shim,
org.apache.hadoop.conf.Configuration hadoopConfig,
org.apache.orc.TypeDescription schema,
int[] selectedFields,
List<OrcFilters.Predicate> conjunctPredicates,
int batchSize) |
| Modifier and Type | Method and Description |
|---|---|
AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT> |
createReader(org.apache.flink.configuration.Configuration config,
SplitT split) |
abstract AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT> |
createReaderBatch(SplitT split,
OrcVectorizedBatchWrapper<BatchT> orcBatch,
org.apache.flink.connector.file.src.util.Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>> recycler,
int batchSize)
Creates the
AbstractOrcFileInputFormat.OrcReaderBatch structure, which is responsible for holding the data
structures that hold the batch data (column vectors, row arrays, ...) and the batch
conversion from the ORC representation to the result format. |
abstract org.apache.flink.api.common.typeinfo.TypeInformation<T> |
getProducedType()
Gets the type produced by this format.
|
boolean |
isSplittable() |
AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT> |
restoreReader(org.apache.flink.configuration.Configuration config,
SplitT split) |
protected final SerializableHadoopConfigWrapper hadoopConfigWrapper
protected final org.apache.orc.TypeDescription schema
protected final int[] selectedFields
protected final List<OrcFilters.Predicate> conjunctPredicates
protected final int batchSize
protected AbstractOrcFileInputFormat(OrcShim<BatchT> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize)
shim - the shim for various Orc dependent versions. If you use the latest version,
please use OrcShim.defaultShim() directly.hadoopConfig - the hadoop config for orc reader.schema - the full schema of orc format.selectedFields - the read selected field of orc format.conjunctPredicates - the filter predicates that can be evaluated.batchSize - the batch size of orc reader.public AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT> createReader(org.apache.flink.configuration.Configuration config, SplitT split) throws IOException
createReader in interface org.apache.flink.connector.file.src.reader.BulkFormat<T,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>IOExceptionpublic AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT> restoreReader(org.apache.flink.configuration.Configuration config, SplitT split) throws IOException
restoreReader in interface org.apache.flink.connector.file.src.reader.BulkFormat<T,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>IOExceptionpublic boolean isSplittable()
public abstract AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT> createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>> recycler, int batchSize)
AbstractOrcFileInputFormat.OrcReaderBatch structure, which is responsible for holding the data
structures that hold the batch data (column vectors, row arrays, ...) and the batch
conversion from the ORC representation to the result format.public abstract org.apache.flink.api.common.typeinfo.TypeInformation<T> getProducedType()
Copyright © 2014–2023 The Apache Software Foundation. All rights reserved.