public class RegexSequenceRecordReader extends FileRecordReader implements SequenceRecordReader
Pattern and Matcher to do the splitting into groups
Example: Data in format "2016-01-01 23:59:59.001 1 DEBUG First entry message!"RegexSequenceRecordReader.LineErrorHandling. Invalid
lines that don't match the provided regex can result in an exception (FailOnInvalid), can be skipped silently (SkipInvalid),
or skip invalid but log a warning (SkipInvalidWithWarning)| Modifier and Type | Class and Description |
|---|---|
static class |
RegexSequenceRecordReader.LineErrorHandling
Error handling mode: How should invalid lines (i.e., those that don't match the provided regex) be handled?
FailOnInvalid: Throw an IllegalStateException when an invalid line is found SkipInvalid: Skip invalid lines (quietly, with no warning) SkipInvalidWithWarning: Skip invalid lines, but log a warning |
| Modifier and Type | Field and Description |
|---|---|
static Charset |
DEFAULT_CHARSET |
static RegexSequenceRecordReader.LineErrorHandling |
DEFAULT_ERROR_HANDLING |
static org.slf4j.Logger |
LOG |
static String |
SKIP_NUM_LINES |
appendLabel, conf, currentUri, labels, locationsIteratorinputSplit, listeners, streamCreatorFnAPPEND_LABEL, LABELS, NAME_SPACE| Constructor and Description |
|---|
RegexSequenceRecordReader(String regex,
int skipNumLines) |
RegexSequenceRecordReader(String regex,
int skipNumLines,
Charset encoding,
RegexSequenceRecordReader.LineErrorHandling errorHandling) |
| Modifier and Type | Method and Description |
|---|---|
void |
initialize(Configuration conf,
InputSplit split)
Called once at initialization.
|
List<SequenceRecord> |
loadSequenceFromMetaData(List<RecordMetaData> recordMetaDatas)
Load multiple sequence records from the given a list of
RecordMetaData instances |
SequenceRecord |
loadSequenceFromMetaData(RecordMetaData recordMetaData)
Load a single sequence record from the given
RecordMetaData instanceNote: that for data that isn't splittable (i.e., text data that needs to be scanned/split), it is more efficient to load multiple records at once using SequenceRecordReader.loadSequenceFromMetaData(List) |
SequenceRecord |
nextSequence()
Similar to
SequenceRecordReader.sequenceRecord(), but returns a Record object, that may include metadata such as the source
of the data |
void |
reset()
Reset record reader iterator
|
List<List<Writable>> |
sequenceRecord()
Returns a sequence record.
|
List<List<Writable>> |
sequenceRecord(URI uri,
DataInputStream dataInputStream)
Load a sequence record from the given DataInputStream
Unlike
RecordReader.next() the internal state of the RecordReader is not modified
Implementations of this method should not close the DataInputStream |
close, doInitialize, getConf, getCurrentLabel, getLabel, getLabels, hasNext, initialize, loadFromMetaData, loadFromMetaData, next, next, nextRecord, record, resetSupported, setConf, setLabelsbatchesSupported, getListeners, invokeListeners, setListeners, setListenersclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitbatchesSupported, getLabels, getListeners, hasNext, initialize, loadFromMetaData, loadFromMetaData, next, next, nextRecord, record, resetSupported, setListeners, setListenersgetConf, setConfpublic static final String SKIP_NUM_LINES
public static final Charset DEFAULT_CHARSET
public static final RegexSequenceRecordReader.LineErrorHandling DEFAULT_ERROR_HANDLING
public static final org.slf4j.Logger LOG
public RegexSequenceRecordReader(String regex, int skipNumLines)
public RegexSequenceRecordReader(String regex, int skipNumLines, Charset encoding, RegexSequenceRecordReader.LineErrorHandling errorHandling)
public void initialize(Configuration conf, InputSplit split) throws IOException, InterruptedException
RecordReaderinitialize in interface RecordReaderinitialize in class FileRecordReaderconf - a configuration for initializationsplit - the split that defines the range of records to readIOExceptionInterruptedExceptionpublic List<List<Writable>> sequenceRecord()
SequenceRecordReadersequenceRecord in interface SequenceRecordReaderpublic List<List<Writable>> sequenceRecord(URI uri, DataInputStream dataInputStream) throws IOException
SequenceRecordReaderRecordReader.next() the internal state of the RecordReader is not modified
Implementations of this method should not close the DataInputStreamsequenceRecord in interface SequenceRecordReaderIOException - if error occurs during reading from the input streampublic void reset()
RecordReaderreset in interface RecordReaderreset in class FileRecordReaderpublic SequenceRecord nextSequence()
SequenceRecordReaderSequenceRecordReader.sequenceRecord(), but returns a Record object, that may include metadata such as the source
of the datanextSequence in interface SequenceRecordReaderpublic SequenceRecord loadSequenceFromMetaData(RecordMetaData recordMetaData) throws IOException
SequenceRecordReaderRecordMetaData instanceSequenceRecordReader.loadSequenceFromMetaData(List)loadSequenceFromMetaData in interface SequenceRecordReaderrecordMetaData - Metadata for the sequence record that we want to load fromIOException - If I/O error occurs during loadingpublic List<SequenceRecord> loadSequenceFromMetaData(List<RecordMetaData> recordMetaDatas) throws IOException
SequenceRecordReaderRecordMetaData instancesloadSequenceFromMetaData in interface SequenceRecordReaderrecordMetaDatas - Metadata for the records that we want to load fromIOException - If I/O error occurs during loadingCopyright © 2019. All rights reserved.