Class ContinuousFileReaderOperator<OUT,​T extends TimestampedInputSplit>

  • All Implemented Interfaces:
    Serializable, org.apache.flink.api.common.state.CheckpointListener, Input<T>, KeyContext, KeyContextHandler, OneInputStreamOperator<T,​OUT>, org.apache.flink.streaming.api.operators.OutputTypeConfigurable<OUT>, StreamOperator<OUT>, StreamOperatorStateHandler.CheckpointedStreamOperator, YieldingOperator<OUT>

    @Internal
    public class ContinuousFileReaderOperator<OUT,​T extends TimestampedInputSplit>
    extends AbstractStreamOperator<OUT>
    implements OneInputStreamOperator<T,​OUT>, org.apache.flink.streaming.api.operators.OutputTypeConfigurable<OUT>
    The operator that reads the splits received from the preceding ContinuousFileMonitoringFunction. Contrary to the ContinuousFileMonitoringFunction which has a parallelism of 1, this operator can have DOP > 1.

    This implementation uses MailboxExecutor to execute each action and state machine approach. The workflow is the following:

    1. start in IDLE
    2. upon receiving a split add it to the queue, switch to OPENING and enqueue a mail to process it
    3. open file, switch to READING, read one record, re-enqueue self
    4. if no more records or splits available, switch back to IDLE

    On close:

    1. if IDLE then close immediately
    2. otherwise switch to CLOSING, call yield in a loop until state is CLOSED
    3. yield() causes remaining records (and splits) to be processed in the same way as above

    Using MailboxExecutor allows to avoid explicit synchronization. At most one mail should be enqueued at any given time.

    Using FSM approach allows to explicitly define states and enforce transitions between them.

    See Also:
    Serialized Form
    • Method Detail

      • open

        public void open()
                  throws Exception
        Description copied from class: AbstractStreamOperator
        This method is called immediately before any elements are processed, it should contain the operator's initialization logic, e.g. state initialization.

        The default implementation does nothing.

        Specified by:
        open in interface StreamOperator<OUT>
        Overrides:
        open in class AbstractStreamOperator<OUT>
        Throws:
        Exception - An exception in this method causes the operator to fail.
      • finish

        public void finish()
                    throws Exception
        Description copied from interface: StreamOperator
        This method is called at the end of data processing.

        The method is expected to flush all remaining buffered data. Exceptions during this flushing of buffered data should be propagated, in order to cause the operation to be recognized as failed, because the last data items are not processed properly.

        After this method is called, no more records can be produced for the downstream operators.

        WARNING: It is not safe to use this method to commit any transactions or other side effects! You can use this method to flush any buffered data that can later on be committed e.g. in a CheckpointListener.notifyCheckpointComplete(long).

        NOTE:This method does not need to close any resources. You should release external resources in the StreamOperator.close() method.

        Specified by:
        finish in interface StreamOperator<OUT>
        Overrides:
        finish in class AbstractStreamOperator<OUT>
        Throws:
        Exception - An exception in this method causes the operator to fail.
      • close

        public void close()
                   throws Exception
        Description copied from interface: StreamOperator
        This method is called at the very end of the operator's life, both in the case of a successful completion of the operation, and in the case of a failure and canceling.

        This method is expected to make a thorough effort to release all resources that the operator has acquired.

        NOTE:It can not emit any records! If you need to emit records at the end of processing, do so in the StreamOperator.finish() method.

        Specified by:
        close in interface StreamOperator<OUT>
        Overrides:
        close in class AbstractStreamOperator<OUT>
        Throws:
        Exception
      • setOutputType

        public void setOutputType​(org.apache.flink.api.common.typeinfo.TypeInformation<OUT> outTypeInfo,
                                  org.apache.flink.api.common.ExecutionConfig executionConfig)
        Specified by:
        setOutputType in interface org.apache.flink.streaming.api.operators.OutputTypeConfigurable<OUT>