Class IntervalJoinOperator<K,​T1,​T2,​OUT>

  • Type Parameters:
    K - The type of the key based on which we join elements.
    T1 - The type of the elements in the left stream.
    T2 - The type of the elements in the right stream.
    OUT - The output type created by the user-defined function.
    All Implemented Interfaces:
    Serializable, org.apache.flink.api.common.state.CheckpointListener, KeyContext, KeyContextHandler, org.apache.flink.streaming.api.operators.OutputTypeConfigurable<OUT>, StreamOperator<OUT>, StreamOperatorStateHandler.CheckpointedStreamOperator, Triggerable<K,​String>, TwoInputStreamOperator<T1,​T2,​OUT>, UserFunctionProvider<ProcessJoinFunction<T1,​T2,​OUT>>, YieldingOperator<OUT>

    @Internal
    public class IntervalJoinOperator<K,​T1,​T2,​OUT>
    extends AbstractUdfStreamOperator<OUT,​ProcessJoinFunction<T1,​T2,​OUT>>
    implements TwoInputStreamOperator<T1,​T2,​OUT>, Triggerable<K,​String>
    An operator to execute time-bounded stream inner joins.

    By using a configurable lower and upper bound this operator will emit exactly those pairs (T1, T2) where t2.ts ∈ [T1.ts + lowerBound, T1.ts + upperBound]. Both the lower and the upper bound can be configured to be either inclusive or exclusive.

    As soon as elements are joined they are passed to a user-defined ProcessJoinFunction.

    The basic idea of this implementation is as follows: Whenever we receive an element at processElement1(StreamRecord) (a.k.a. the left side), we add it to the left buffer. We then check the right buffer to see whether there are any elements that can be joined. If there are, they are joined and passed to the aforementioned function. The same happens the other way around when receiving an element on the right side.

    Whenever a pair of elements is emitted it will be assigned the max timestamp of either of the elements.

    In order to avoid the element buffers to grow indefinitely a cleanup timer is registered per element. This timer indicates when an element is not considered for joining anymore and can be removed from the state.

    See Also:
    Serialized Form
    • Constructor Detail

      • IntervalJoinOperator

        public IntervalJoinOperator​(long lowerBound,
                                    long upperBound,
                                    boolean lowerBoundInclusive,
                                    boolean upperBoundInclusive,
                                    org.apache.flink.util.OutputTag<T1> leftLateDataOutputTag,
                                    org.apache.flink.util.OutputTag<T2> rightLateDataOutputTag,
                                    org.apache.flink.api.common.typeutils.TypeSerializer<T1> leftTypeSerializer,
                                    org.apache.flink.api.common.typeutils.TypeSerializer<T2> rightTypeSerializer,
                                    ProcessJoinFunction<T1,​T2,​OUT> udf)
        Creates a new IntervalJoinOperator.
        Parameters:
        lowerBound - The lower bound for evaluating if elements should be joined
        upperBound - The upper bound for evaluating if elements should be joined
        lowerBoundInclusive - Whether or not to include elements where the timestamp matches the lower bound
        upperBoundInclusive - Whether or not to include elements where the timestamp matches the upper bound
        udf - A user-defined ProcessJoinFunction that gets called whenever two elements of T1 and T2 are joined