Class AbstractFetcher<T,​KPH>

  • Type Parameters:
    T - The type of elements deserialized from Kafka's byte records, and emitted into the Flink data streams.
    KPH - The type of topic/partition identifier used by Kafka in the specific version.
    Direct Known Subclasses:
    KafkaFetcher

    @Internal
    public abstract class AbstractFetcher<T,​KPH>
    extends Object
    Base class for all fetchers, which implement the connections to Kafka brokers and pull records from Kafka partitions.

    This fetcher base class implements the logic around emitting records and tracking offsets, as well as around the optional timestamp assignment and watermark generation.

    • Field Detail

      • sourceContext

        protected final org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext<T> sourceContext
        The source context to emit records and watermarks to.
      • watermarkOutput

        protected final org.apache.flink.api.common.eventtime.WatermarkOutput watermarkOutput
        Wrapper around our SourceContext for allowing the WatermarkGenerator to emit watermarks and mark idleness.
      • checkpointLock

        protected final Object checkpointLock
        The lock that guarantees that record emission and state updates are atomic, from the view of taking a checkpoint.
      • unassignedPartitionsQueue

        protected final ClosableBlockingQueue<KafkaTopicPartitionState<T,​KPH>> unassignedPartitionsQueue
        Queue of partitions that are not yet assigned to any Kafka clients for consuming. Kafka version-specific implementations of runFetchLoop() should continuously poll this queue for unassigned partitions, and start consuming them accordingly.

        All partitions added to this queue are guaranteed to have been added to subscribedPartitionStates already.

    • Constructor Detail

      • AbstractFetcher

        protected AbstractFetcher​(org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext<T> sourceContext,
                                  Map<KafkaTopicPartition,​Long> seedPartitionsWithInitialOffsets,
                                  org.apache.flink.util.SerializedValue<org.apache.flink.api.common.eventtime.WatermarkStrategy<T>> watermarkStrategy,
                                  org.apache.flink.streaming.runtime.tasks.ProcessingTimeService processingTimeProvider,
                                  long autoWatermarkInterval,
                                  ClassLoader userCodeClassLoader,
                                  org.apache.flink.metrics.MetricGroup consumerMetricGroup,
                                  boolean useMetrics)
                           throws Exception
        Throws:
        Exception
    • Method Detail

      • addDiscoveredPartitions

        public void addDiscoveredPartitions​(List<KafkaTopicPartition> newPartitions)
                                     throws IOException,
                                            ClassNotFoundException
        Adds a list of newly discovered partitions to the fetcher for consuming.

        This method creates the partition state holder for each new partition, using KafkaTopicPartitionStateSentinel.EARLIEST_OFFSET as the starting offset. It uses the earliest offset because there may be delay in discovering a partition after it was created and started receiving records.

        After the state representation for a partition is created, it is added to the unassigned partitions queue to await to be consumed.

        Parameters:
        newPartitions - discovered partitions to add
        Throws:
        IOException
        ClassNotFoundException
      • subscribedPartitionStates

        protected final List<KafkaTopicPartitionState<T,​KPH>> subscribedPartitionStates()
        Gets all partitions (with partition state) that this fetcher is subscribed to.
        Returns:
        All subscribed partitions.
      • cancel

        public abstract void cancel()
      • commitInternalOffsetsToKafka

        public final void commitInternalOffsetsToKafka​(Map<KafkaTopicPartition,​Long> offsets,
                                                       @Nonnull
                                                       KafkaCommitCallback commitCallback)
                                                throws Exception
        Commits the given partition offsets to the Kafka brokers (or to ZooKeeper for older Kafka versions). This method is only ever called when the offset commit mode of the consumer is OffsetCommitMode.ON_CHECKPOINTS.

        The given offsets are the internal checkpointed offsets, representing the last processed record of each partition. Version-specific implementations of this method need to hold the contract that the given offsets must be incremented by 1 before committing them, so that committed offsets to Kafka represent "the next record to process".

        Parameters:
        offsets - The offsets to commit to Kafka (implementations must increment offsets by 1 before committing).
        commitCallback - The callback that the user should trigger when a commit request completes or fails.
        Throws:
        Exception - This method forwards exceptions.
      • createKafkaPartitionHandle

        protected abstract KPH createKafkaPartitionHandle​(KafkaTopicPartition partition)
        Creates the Kafka version specific representation of the given topic partition.
        Parameters:
        partition - The Flink representation of the Kafka topic partition.
        Returns:
        The version-specific Kafka representation of the Kafka topic partition.
      • snapshotCurrentState

        public HashMap<KafkaTopicPartition,​Long> snapshotCurrentState()
        Takes a snapshot of the partition offsets.

        Important: This method must be called under the checkpoint lock.

        Returns:
        A map from partition to current offset.
      • emitRecordsWithTimestamps

        protected void emitRecordsWithTimestamps​(Queue<T> records,
                                                 KafkaTopicPartitionState<T,​KPH> partitionState,
                                                 long offset,
                                                 long kafkaEventTimestamp)
        Emits a record attaching a timestamp to it.
        Parameters:
        records - The records to emit
        partitionState - The state of the Kafka partition from which the record was fetched
        offset - The offset of the corresponding Kafka record
        kafkaEventTimestamp - The timestamp of the Kafka record