public abstract class HashJoinOperator extends TableStreamOperator<org.apache.flink.table.data.RowData> implements org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData>, org.apache.flink.streaming.api.operators.BoundedMultiInput, org.apache.flink.streaming.api.operators.InputSelectable
The join operator implements the logic of a join operator at runtime. It uses a
hybrid-hash-join internally to match the records with equal key. The build side of the hash is
the first input of the match. It support all join type in HashJoinType.
Note: In order to solve the problem of data skew, or too much data in the hash table, the fallback to sort merge join mechanism is introduced here. If some partitions are spilled to disk more than three times in the process of hash join, it will fallback to sort merge join by default to improve stability. In the future, we will support more flexible adaptive hash join strategy, for example, in the process of building a hash table, if the size of data written to disk reaches a certain threshold, fallback to sort merge join in advance.
TableStreamOperator.ContextImplctx, currentWatermark| 限定符和类型 | 方法和说明 |
|---|---|
void |
close() |
void |
endInput(int inputId) |
abstract void |
join(RowIterator<org.apache.flink.table.data.binary.BinaryRowData> buildIter,
org.apache.flink.table.data.RowData probeRow) |
static HashJoinOperator |
newHashJoinOperator(HashJoinType type,
boolean leftIsBuild,
boolean compressionEnable,
int compressionBlockSize,
GeneratedJoinCondition condFuncCode,
boolean reverseJoinFunction,
boolean[] filterNullKeys,
GeneratedProjection buildProjectionCode,
GeneratedProjection probeProjectionCode,
boolean tryDistinctBuildRow,
int buildRowSize,
long buildRowCount,
long probeRowCount,
org.apache.flink.table.types.logical.RowType keyType,
SortMergeJoinFunction sortMergeJoinFunction) |
org.apache.flink.streaming.api.operators.InputSelection |
nextSelection() |
void |
open() |
void |
processElement1(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element) |
void |
processElement2(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element) |
computeMemorySize, processWatermark, useSplittableTimersfinish, getChainingStrategy, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getStateKeySelector1, getStateKeySelector2, getTimeServiceManager, getUserCodeClassloader, hasKeyContext1, hasKeyContext2, initializeState, initializeState, isUsingCustomRawKeyedState, notifyCheckpointAborted, notifyCheckpointComplete, prepareSnapshotPreBarrier, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processRecordAttributes, processRecordAttributes1, processRecordAttributes2, processWatermark1, processWatermark2, processWatermarkStatus, processWatermarkStatus1, processWatermarkStatus2, reportOrForwardLatencyMarker, setChainingStrategy, setCurrentKey, setKeyContextElement1, setKeyContextElement2, setMailboxExecutor, setProcessingTimeService, setup, snapshotState, snapshotStateclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitprocessLatencyMarker1, processLatencyMarker2, processRecordAttributes1, processRecordAttributes2, processWatermark1, processWatermark2, processWatermarkStatus1, processWatermarkStatus2finish, getMetricGroup, getOperatorAttributes, getOperatorID, initializeState, prepareSnapshotPreBarrier, setKeyContextElement1, setKeyContextElement2, snapshotStatepublic void open()
throws Exception
open 在接口中 org.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>open 在类中 TableStreamOperator<org.apache.flink.table.data.RowData>Exceptionpublic void processElement1(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element)
throws Exception
processElement1 在接口中 org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData>Exceptionpublic void processElement2(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element)
throws Exception
processElement2 在接口中 org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData>Exceptionpublic org.apache.flink.streaming.api.operators.InputSelection nextSelection()
nextSelection 在接口中 org.apache.flink.streaming.api.operators.InputSelectablepublic void endInput(int inputId)
throws Exception
endInput 在接口中 org.apache.flink.streaming.api.operators.BoundedMultiInputExceptionpublic abstract void join(RowIterator<org.apache.flink.table.data.binary.BinaryRowData> buildIter, org.apache.flink.table.data.RowData probeRow) throws Exception
Exceptionpublic void close()
throws Exception
close 在接口中 org.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>close 在类中 org.apache.flink.streaming.api.operators.AbstractStreamOperator<org.apache.flink.table.data.RowData>Exceptionpublic static HashJoinOperator newHashJoinOperator(HashJoinType type, boolean leftIsBuild, boolean compressionEnable, int compressionBlockSize, GeneratedJoinCondition condFuncCode, boolean reverseJoinFunction, boolean[] filterNullKeys, GeneratedProjection buildProjectionCode, GeneratedProjection probeProjectionCode, boolean tryDistinctBuildRow, int buildRowSize, long buildRowCount, long probeRowCount, org.apache.flink.table.types.logical.RowType keyType, SortMergeJoinFunction sortMergeJoinFunction)
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.