Package org.datavec.local.transforms
Class AnalyzeLocal
- java.lang.Object
-
- org.datavec.local.transforms.AnalyzeLocal
-
public class AnalyzeLocal extends Object
-
-
Constructor Summary
Constructors Constructor Description AnalyzeLocal()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static DataAnalysisanalyze(Schema schema, RecordReader rr)Analyse the specified data - returns a DataAnalysis object with summary information about each columnstatic DataAnalysisanalyze(Schema schema, RecordReader rr, int maxHistogramBuckets)Analyse the specified data - returns a DataAnalysis object with summary information about each columnstatic DataQualityAnalysisanalyzeQuality(Schema schema, RecordReader data)Analyze the data quality of data - provides a report on missing values, values that don't comply with schema, etcstatic DataQualityAnalysisanalyzeQualitySequence(Schema schema, SequenceRecordReader data)Analyze the data quality of sequence data - provides a report on missing values, values that don't comply with schema, etcstatic Set<Writable>getUnique(String columnName, Schema schema, RecordReader data)Get a list of unique values from the specified columns.static Map<String,Set<Writable>>getUnique(List<String> columnNames, Schema schema, RecordReader data)Get a list of unique values from the specified columns.static Set<Writable>getUniqueSequence(String columnName, Schema schema, SequenceRecordReader sequenceData)Get a list of unique values from the specified column of a sequencestatic Map<String,Set<Writable>>getUniqueSequence(List<String> columnNames, Schema schema, SequenceRecordReader sequenceData)Get a list of unique values from the specified columns of a sequence
-
-
-
Method Detail
-
analyze
public static DataAnalysis analyze(Schema schema, RecordReader rr)
Analyse the specified data - returns a DataAnalysis object with summary information about each column- Parameters:
schema- Schema for datarr- Data to analyze- Returns:
- DataAnalysis for data
-
analyze
public static DataAnalysis analyze(Schema schema, RecordReader rr, int maxHistogramBuckets)
Analyse the specified data - returns a DataAnalysis object with summary information about each column- Parameters:
schema- Schema for datarr- Data to analyze- Returns:
- DataAnalysis for data
-
analyzeQualitySequence
public static DataQualityAnalysis analyzeQualitySequence(Schema schema, SequenceRecordReader data)
Analyze the data quality of sequence data - provides a report on missing values, values that don't comply with schema, etc- Parameters:
schema- Schema for datadata- Data to analyze- Returns:
- DataQualityAnalysis object
-
analyzeQuality
public static DataQualityAnalysis analyzeQuality(Schema schema, RecordReader data)
Analyze the data quality of data - provides a report on missing values, values that don't comply with schema, etc- Parameters:
schema- Schema for datadata- Data to analyze- Returns:
- DataQualityAnalysis object
-
getUnique
public static Set<Writable> getUnique(String columnName, Schema schema, RecordReader data)
Get a list of unique values from the specified columns. For sequence data, usegetUniqueSequence(List, Schema, SequenceRecordReader)- Parameters:
columnName- Name of the column to get unique values fromschema- Data schemadata- Data to get unique values from- Returns:
- List of unique values
-
getUnique
public static Map<String,Set<Writable>> getUnique(List<String> columnNames, Schema schema, RecordReader data)
Get a list of unique values from the specified columns. For sequence data, usegetUniqueSequence(String, Schema, SequenceRecordReader)- Parameters:
columnNames- Names of the column to get unique values fromschema- Data schemadata- Data to get unique values from- Returns:
- List of unique values, for each of the specified columns
-
getUniqueSequence
public static Set<Writable> getUniqueSequence(String columnName, Schema schema, SequenceRecordReader sequenceData)
Get a list of unique values from the specified column of a sequence- Parameters:
columnName- Name of the column to get unique values fromschema- Data schemasequenceData- Sequence data to get unique values from- Returns:
-
getUniqueSequence
public static Map<String,Set<Writable>> getUniqueSequence(List<String> columnNames, Schema schema, SequenceRecordReader sequenceData)
Get a list of unique values from the specified columns of a sequence- Parameters:
columnNames- Name of the columns to get unique values fromschema- Data schemasequenceData- Sequence data to get unique values from- Returns:
-
-