public abstract class HashJoinOperator extends TableStreamOperator<org.apache.flink.table.data.RowData> implements org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData>, org.apache.flink.streaming.api.operators.BoundedMultiInput, org.apache.flink.streaming.api.operators.InputSelectable
The join operator implements the logic of a join operator at runtime. It uses a
hybrid-hash-join internally to match the records with equal key. The build side of the hash is
the first input of the match. It support all join type in HashJoinType.
Note: In order to solve the problem of data skew, or too much data in the hash table, the fallback to sort merge join mechanism is introduced here. If some partitions are spilled to disk more than three times in the process of hash join, it will fallback to sort merge join by default to improve stability. In the future, we will support more flexible adaptive hash join strategy, for example, in the process of building a hash table, if the size of data written to disk reaches a certain threshold, fallback to sort merge join in advance.
TableStreamOperator.ContextImplctx, currentWatermark| Modifier and Type | Method and Description |
|---|---|
void |
close() |
void |
endInput(int inputId) |
abstract void |
join(RowIterator<org.apache.flink.table.data.binary.BinaryRowData> buildIter,
org.apache.flink.table.data.RowData probeRow) |
static HashJoinOperator |
newHashJoinOperator(HashJoinType type,
boolean leftIsBuild,
boolean compressionEnable,
int compressionBlockSize,
GeneratedJoinCondition condFuncCode,
boolean reverseJoinFunction,
boolean[] filterNullKeys,
GeneratedProjection buildProjectionCode,
GeneratedProjection probeProjectionCode,
boolean tryDistinctBuildRow,
int buildRowSize,
long buildRowCount,
long probeRowCount,
org.apache.flink.table.types.logical.RowType keyType,
SortMergeJoinFunction sortMergeJoinFunction) |
org.apache.flink.streaming.api.operators.InputSelection |
nextSelection() |
void |
open() |
void |
processElement1(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element) |
void |
processElement2(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element) |
computeMemorySize, processWatermarkfinish, getChainingStrategy, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getTimeServiceManager, getUserCodeClassloader, hasKeyContext1, hasKeyContext2, initializeState, initializeState, isUsingCustomRawKeyedState, notifyCheckpointAborted, notifyCheckpointComplete, prepareSnapshotPreBarrier, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processRecordAttributes, processRecordAttributes1, processRecordAttributes2, processWatermark1, processWatermark2, processWatermarkStatus, processWatermarkStatus1, processWatermarkStatus2, reportOrForwardLatencyMarker, setChainingStrategy, setCurrentKey, setKeyContextElement1, setKeyContextElement2, setProcessingTimeService, setup, snapshotState, snapshotStateclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitprocessLatencyMarker1, processLatencyMarker2, processRecordAttributes1, processRecordAttributes2, processWatermark1, processWatermark2, processWatermarkStatus1, processWatermarkStatus2finish, getMetricGroup, getOperatorID, initializeState, prepareSnapshotPreBarrier, setKeyContextElement1, setKeyContextElement2, snapshotStatenotifyCheckpointAborted, notifyCheckpointCompletepublic void open()
throws Exception
open in interface org.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>open in class TableStreamOperator<org.apache.flink.table.data.RowData>Exceptionpublic void processElement1(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element)
throws Exception
processElement1 in interface org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData>Exceptionpublic void processElement2(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element)
throws Exception
processElement2 in interface org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData,org.apache.flink.table.data.RowData>Exceptionpublic org.apache.flink.streaming.api.operators.InputSelection nextSelection()
nextSelection in interface org.apache.flink.streaming.api.operators.InputSelectablepublic void endInput(int inputId)
throws Exception
endInput in interface org.apache.flink.streaming.api.operators.BoundedMultiInputExceptionpublic abstract void join(RowIterator<org.apache.flink.table.data.binary.BinaryRowData> buildIter, org.apache.flink.table.data.RowData probeRow) throws Exception
Exceptionpublic void close()
throws Exception
close in interface org.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>close in class org.apache.flink.streaming.api.operators.AbstractStreamOperator<org.apache.flink.table.data.RowData>Exceptionpublic static HashJoinOperator newHashJoinOperator(HashJoinType type, boolean leftIsBuild, boolean compressionEnable, int compressionBlockSize, GeneratedJoinCondition condFuncCode, boolean reverseJoinFunction, boolean[] filterNullKeys, GeneratedProjection buildProjectionCode, GeneratedProjection probeProjectionCode, boolean tryDistinctBuildRow, int buildRowSize, long buildRowCount, long probeRowCount, org.apache.flink.table.types.logical.RowType keyType, SortMergeJoinFunction sortMergeJoinFunction)
Copyright © 2014–2025 The Apache Software Foundation. All rights reserved.