Class JdbcInputFormat

  • All Implemented Interfaces:
    Serializable, org.apache.flink.api.common.io.InputFormat<org.apache.flink.types.Row,​org.apache.flink.core.io.InputSplit>, org.apache.flink.api.java.typeutils.ResultTypeQueryable<org.apache.flink.types.Row>, org.apache.flink.core.io.InputSplitSource<org.apache.flink.core.io.InputSplit>

    @Experimental
    public class JdbcInputFormat
    extends org.apache.flink.api.common.io.RichInputFormat<org.apache.flink.types.Row,​org.apache.flink.core.io.InputSplit>
    implements org.apache.flink.api.java.typeutils.ResultTypeQueryable<org.apache.flink.types.Row>
    InputFormat to read data from a database and generate Rows. The InputFormat has to be configured using the supplied InputFormatBuilder. A valid RowTypeInfo must be properly configured in the builder, e.g.:
    
     TypeInformation[] fieldTypes = new TypeInformation[] {
            BasicTypeInfo.INT_TYPE_INFO,
            BasicTypeInfo.STRING_TYPE_INFO,
            BasicTypeInfo.STRING_TYPE_INFO,
            BasicTypeInfo.DOUBLE_TYPE_INFO,
            BasicTypeInfo.INT_TYPE_INFO
     };
    
     RowTypeInfo rowTypeInfo = new RowTypeInfo(fieldTypes);
    
     JdbcInputFormat jdbcInputFormat = JdbcInputFormat.buildJdbcInputFormat()
                            .setDrivername("org.apache.derby.jdbc.EmbeddedDriver")
                            .setDBUrl("jdbc:derby:memory:ebookshop")
                            .setQuery("select * from books")
                            .setRowTypeInfo(rowTypeInfo)
                            .finish();
     

    In order to query the JDBC source in parallel, you need to provide a parameterized query template (i.e. a valid PreparedStatement) and a JdbcParameterValuesProvider which provides binding values for the query parameters. E.g.:

    
    
     Serializable[][] queryParameters = new String[2][1];
     queryParameters[0] = new String[]{"Kumar"};
     queryParameters[1] = new String[]{"Tan Ah Teck"};
    
     JdbcInputFormat jdbcInputFormat = JdbcInputFormat.buildJdbcInputFormat()
                            .setDrivername("org.apache.derby.jdbc.EmbeddedDriver")
                            .setDBUrl("jdbc:derby:memory:ebookshop")
                            .setQuery("select * from books WHERE author = ?")
                            .setRowTypeInfo(rowTypeInfo)
                            .setParametersProvider(new JdbcGenericParameterValuesProvider(queryParameters))
                            .finish();
     
    See Also:
    Row, JdbcParameterValuesProvider, PreparedStatement, DriverManager, Serialized Form
    • Field Detail

      • LOG

        protected static final org.slf4j.Logger LOG
      • queryTemplate

        protected String queryTemplate
      • resultSetType

        protected int resultSetType
      • resultSetConcurrency

        protected int resultSetConcurrency
      • rowTypeInfo

        protected org.apache.flink.api.java.typeutils.RowTypeInfo rowTypeInfo
      • resultSet

        protected transient ResultSet resultSet
      • fetchSize

        protected int fetchSize
      • autoCommit

        protected Boolean autoCommit
      • hasNext

        protected boolean hasNext
      • parameterValues

        protected Object[][] parameterValues
    • Constructor Detail

      • JdbcInputFormat

        public JdbcInputFormat()
    • Method Detail

      • getProducedType

        public org.apache.flink.api.java.typeutils.RowTypeInfo getProducedType()
        Specified by:
        getProducedType in interface org.apache.flink.api.java.typeutils.ResultTypeQueryable<org.apache.flink.types.Row>
      • configure

        public void configure​(org.apache.flink.configuration.Configuration parameters)
        Specified by:
        configure in interface org.apache.flink.api.common.io.InputFormat<org.apache.flink.types.Row,​org.apache.flink.core.io.InputSplit>
      • openInputFormat

        public void openInputFormat()
        Overrides:
        openInputFormat in class org.apache.flink.api.common.io.RichInputFormat<org.apache.flink.types.Row,​org.apache.flink.core.io.InputSplit>
      • closeInputFormat

        public void closeInputFormat()
        Overrides:
        closeInputFormat in class org.apache.flink.api.common.io.RichInputFormat<org.apache.flink.types.Row,​org.apache.flink.core.io.InputSplit>
      • open

        public void open​(org.apache.flink.core.io.InputSplit inputSplit)
                  throws IOException
        Connects to the source database and executes the query in a parallel fashion if this InputFormat is built using a parameterized query (i.e. using a PreparedStatement) and a proper JdbcParameterValuesProvider, in a non-parallel fashion otherwise.
        Specified by:
        open in interface org.apache.flink.api.common.io.InputFormat<org.apache.flink.types.Row,​org.apache.flink.core.io.InputSplit>
        Parameters:
        inputSplit - which is ignored if this InputFormat is executed as a non-parallel source, a "hook" to the query parameters otherwise (using its splitNumber)
        Throws:
        IOException - if there's an error during the execution of the query
      • close

        public void close()
                   throws IOException
        Closes all resources used.
        Specified by:
        close in interface org.apache.flink.api.common.io.InputFormat<org.apache.flink.types.Row,​org.apache.flink.core.io.InputSplit>
        Throws:
        IOException - Indicates that a resource could not be closed.
      • reachedEnd

        public boolean reachedEnd()
                           throws IOException
        Checks whether all data has been read.
        Specified by:
        reachedEnd in interface org.apache.flink.api.common.io.InputFormat<org.apache.flink.types.Row,​org.apache.flink.core.io.InputSplit>
        Returns:
        boolean value indication whether all data has been read.
        Throws:
        IOException
      • nextRecord

        public org.apache.flink.types.Row nextRecord​(org.apache.flink.types.Row reuse)
                                              throws IOException
        Stores the next resultSet row in a tuple.
        Specified by:
        nextRecord in interface org.apache.flink.api.common.io.InputFormat<org.apache.flink.types.Row,​org.apache.flink.core.io.InputSplit>
        Parameters:
        reuse - row to be reused.
        Returns:
        row containing next Row
        Throws:
        IOException
      • getStatistics

        public org.apache.flink.api.common.io.statistics.BaseStatistics getStatistics​(org.apache.flink.api.common.io.statistics.BaseStatistics cachedStatistics)
                                                                               throws IOException
        Specified by:
        getStatistics in interface org.apache.flink.api.common.io.InputFormat<org.apache.flink.types.Row,​org.apache.flink.core.io.InputSplit>
        Throws:
        IOException
      • createInputSplits

        public org.apache.flink.core.io.InputSplit[] createInputSplits​(int minNumSplits)
                                                                throws IOException
        Specified by:
        createInputSplits in interface org.apache.flink.api.common.io.InputFormat<org.apache.flink.types.Row,​org.apache.flink.core.io.InputSplit>
        Specified by:
        createInputSplits in interface org.apache.flink.core.io.InputSplitSource<org.apache.flink.core.io.InputSplit>
        Throws:
        IOException
      • getInputSplitAssigner

        public org.apache.flink.core.io.InputSplitAssigner getInputSplitAssigner​(org.apache.flink.core.io.InputSplit[] inputSplits)
        Specified by:
        getInputSplitAssigner in interface org.apache.flink.api.common.io.InputFormat<org.apache.flink.types.Row,​org.apache.flink.core.io.InputSplit>
        Specified by:
        getInputSplitAssigner in interface org.apache.flink.core.io.InputSplitSource<org.apache.flink.core.io.InputSplit>
      • getDbConn

        @VisibleForTesting
        protected Connection getDbConn()
      • buildJdbcInputFormat

        public static JdbcInputFormat.JdbcInputFormatBuilder buildJdbcInputFormat()
        A builder used to set parameters to the output format's configuration in a fluent way.
        Returns:
        builder