Class AnalyzeLocal


  • public class AnalyzeLocal
    extends Object
    • Constructor Detail

      • AnalyzeLocal

        public AnalyzeLocal()
    • Method Detail

      • analyze

        public static DataAnalysis analyze​(Schema schema,
                                           RecordReader rr)
        Analyse the specified data - returns a DataAnalysis object with summary information about each column
        Parameters:
        schema - Schema for data
        rr - Data to analyze
        Returns:
        DataAnalysis for data
      • analyze

        public static DataAnalysis analyze​(Schema schema,
                                           RecordReader rr,
                                           int maxHistogramBuckets)
        Analyse the specified data - returns a DataAnalysis object with summary information about each column
        Parameters:
        schema - Schema for data
        rr - Data to analyze
        Returns:
        DataAnalysis for data
      • analyzeQualitySequence

        public static DataQualityAnalysis analyzeQualitySequence​(Schema schema,
                                                                 SequenceRecordReader data)
        Analyze the data quality of sequence data - provides a report on missing values, values that don't comply with schema, etc
        Parameters:
        schema - Schema for data
        data - Data to analyze
        Returns:
        DataQualityAnalysis object
      • analyzeQuality

        public static DataQualityAnalysis analyzeQuality​(Schema schema,
                                                         RecordReader data)
        Analyze the data quality of data - provides a report on missing values, values that don't comply with schema, etc
        Parameters:
        schema - Schema for data
        data - Data to analyze
        Returns:
        DataQualityAnalysis object
      • getUniqueSequence

        public static Set<Writable> getUniqueSequence​(String columnName,
                                                      Schema schema,
                                                      SequenceRecordReader sequenceData)
        Get a list of unique values from the specified column of a sequence
        Parameters:
        columnName - Name of the column to get unique values from
        schema - Data schema
        sequenceData - Sequence data to get unique values from
        Returns:
      • getUniqueSequence

        public static Map<String,​Set<Writable>> getUniqueSequence​(List<String> columnNames,
                                                                        Schema schema,
                                                                        SequenceRecordReader sequenceData)
        Get a list of unique values from the specified columns of a sequence
        Parameters:
        columnNames - Name of the columns to get unique values from
        schema - Data schema
        sequenceData - Sequence data to get unique values from
        Returns: