public class FlinkKafkaConsumer<T>
extends org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction<T>
implements org.apache.flink.streaming.api.checkpoint.CheckpointCommitter, org.apache.flink.streaming.api.checkpoint.CheckpointedAsynchronously<long[]>, org.apache.flink.api.java.typeutils.ResultTypeQueryable<T>
The Flink Kafka Consumer participates in checkpointing and guarantees that no data is lost during a failure, and that the computation processes elements "exactly once". (Note: These guarantees naturally assume that Kafka itself does not loose any data.)
To support a variety of Kafka brokers, protocol versions, and offset committing approaches, the Flink Kafka Consumer can be parametrized with a fetcher and an offset handler.
The fetcher is responsible to pull data from Kafka. Because Kafka has undergone a change in protocols and APIs, there are currently two fetchers available:
FlinkKafkaConsumer.FetcherType.NEW_HIGH_LEVEL: A fetcher based on the new Kafka consumer API.
This fetcher is generally more robust, but works only with later versions of
Kafka (> 0.8.2).FlinkKafkaConsumer.FetcherType.LEGACY_LOW_LEVEL: A fetcher based on the old low-level consumer API.
This fetcher is works also with older versions of Kafka (0.8.1). The fetcher interprets
the old Kafka consumer properties, like:
Offsets whose records have been read and are checkpointed will be committed back to Kafka / ZooKeeper by the offset handler. In addition, the offset handler finds the point where the source initially starts reading from the stream, when the streaming job is started.
Currently, the source offers two different offset handlers exist:
FlinkKafkaConsumer.OffsetStore.KAFKA: Use this offset handler when the Kafka brokers are managing the offsets,
and hence offsets need to be committed the Kafka brokers, rather than to ZooKeeper.
Note that this offset handler works only on new versions of Kafka (0.8.2.x +) and
with the FlinkKafkaConsumer.FetcherType.NEW_HIGH_LEVEL fetcher.FlinkKafkaConsumer.OffsetStore.FLINK_ZOOKEEPER: Use this offset handler when the offsets are managed
by ZooKeeper, as in older versions of Kafka (0.8.1.x)Please note that Flink snapshots the offsets internally as part of its distributed checkpoints. The offsets committed to Kafka / ZooKeeper are only to bring the outside view of progress in sync with Flink's view of the progress. That way, monitoring and other jobs can get a view of how far the Flink Kafka consumer has consumed a topic.
NOTE: The implementation currently accesses partition metadata when the consumer is constructed. That means that the client that submits the program needs to be able to reach the Kafka brokers or ZooKeeper.
| Modifier and Type | Class and Description |
|---|---|
static class |
FlinkKafkaConsumer.FetcherType
The fetcher type defines which code paths to use to pull data from teh Kafka broker.
|
static class |
FlinkKafkaConsumer.OffsetStore
The offset store defines how acknowledged offsets are committed back to Kafka.
|
| Modifier and Type | Field and Description |
|---|---|
static int |
DEFAULT_GET_PARTITIONS_RETRIES
Default number of retries for getting the partition info.
|
static String |
GET_PARTITIONS_RETRIES_KEY
Configuration key for the number of retries for getting the partition info
|
static int |
MAX_NUM_PENDING_CHECKPOINTS
The maximum number of pending non-committed checkpoints to track, to avoid memory leaks
|
static long |
OFFSET_NOT_SET
Magic number to define an unset offset.
|
| Constructor and Description |
|---|
FlinkKafkaConsumer(String topic,
org.apache.flink.streaming.util.serialization.DeserializationSchema<T> valueDeserializer,
Properties props,
FlinkKafkaConsumer.OffsetStore offsetStore,
FlinkKafkaConsumer.FetcherType fetcherType)
Creates a new Flink Kafka Consumer, using the given type of fetcher and offset handler.
|
| Modifier and Type | Method and Description |
|---|---|
protected static List<org.apache.kafka.common.TopicPartition> |
assignPartitions(int[] partitions,
String topicName,
int numConsumers,
int consumerIndex) |
void |
cancel() |
void |
close() |
void |
commitCheckpoint(long checkpointId) |
static List<org.apache.kafka.common.PartitionInfo> |
getPartitionsForTopic(String topic,
Properties properties)
Send request to Kafka to get partitions for topic.
|
org.apache.flink.api.common.typeinfo.TypeInformation<T> |
getProducedType() |
void |
open(org.apache.flink.configuration.Configuration parameters) |
void |
restoreState(long[] restoredOffsets) |
void |
run(org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext<T> sourceContext) |
long[] |
snapshotState(long checkpointId,
long checkpointTimestamp) |
protected static void |
validateZooKeeperConfig(Properties props) |
public static final long OFFSET_NOT_SET
public static final int MAX_NUM_PENDING_CHECKPOINTS
public static final String GET_PARTITIONS_RETRIES_KEY
public static final int DEFAULT_GET_PARTITIONS_RETRIES
public FlinkKafkaConsumer(String topic, org.apache.flink.streaming.util.serialization.DeserializationSchema<T> valueDeserializer, Properties props, FlinkKafkaConsumer.OffsetStore offsetStore, FlinkKafkaConsumer.FetcherType fetcherType)
To determine which kink of fetcher and offset handler to use, please refer to the docs at the beginnign of this class.
topic - The Kafka topic to read from.valueDeserializer - The deserializer to turn raw byte messages into Java/Scala objects.props - The properties that are used to configure both the fetcher and the offset handler.offsetStore - The type of offset store to use (Kafka / ZooKeeper)fetcherType - The type of fetcher to use (new high-level API, old low-level API).public void open(org.apache.flink.configuration.Configuration parameters)
throws Exception
open in interface org.apache.flink.api.common.functions.RichFunctionopen in class org.apache.flink.api.common.functions.AbstractRichFunctionExceptionpublic void run(org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext<T> sourceContext) throws Exception
public void cancel()
cancel in interface org.apache.flink.streaming.api.functions.source.SourceFunction<T>public void close()
throws Exception
close in interface org.apache.flink.api.common.functions.RichFunctionclose in class org.apache.flink.api.common.functions.AbstractRichFunctionExceptionpublic org.apache.flink.api.common.typeinfo.TypeInformation<T> getProducedType()
getProducedType in interface org.apache.flink.api.java.typeutils.ResultTypeQueryable<T>public long[] snapshotState(long checkpointId,
long checkpointTimestamp)
throws Exception
snapshotState in interface org.apache.flink.streaming.api.checkpoint.Checkpointed<long[]>Exceptionpublic void restoreState(long[] restoredOffsets)
restoreState in interface org.apache.flink.streaming.api.checkpoint.Checkpointed<long[]>public void commitCheckpoint(long checkpointId)
commitCheckpoint in interface org.apache.flink.streaming.api.checkpoint.CheckpointCommitterprotected static List<org.apache.kafka.common.TopicPartition> assignPartitions(int[] partitions, String topicName, int numConsumers, int consumerIndex)
public static List<org.apache.kafka.common.PartitionInfo> getPartitionsForTopic(String topic, Properties properties)
topic - The name of the topic.properties - The properties for the Kafka Consumer that is used to query the partitions for the topic.protected static void validateZooKeeperConfig(Properties props)
Copyright © 2014–2015 The Apache Software Foundation. All rights reserved.