public class AvroFlattener
extends Object
This class provides methods to flatten an Avro Schema to make it more optimal for ORC
(Hive does not support predicate pushdown for ORC with nested fields)
The behavior of Avro Schema un-nesting is listed below:
1. Record within Record (and so on recursively) are flattened into the parent Record
Record R1 {
fields: {[
{
Record R2 {
fields: {[
{
Record R3 {
fields: {[
{
String S2
}
]}
}, {
String S3
}
}
]}
}
}, {
String S1
}
]}
}
will be flattened to:
Record R1 {
fields: {[
{
String S1
}, {
String S2
}, {
String S3
}
]}
}
2. All fields un-nested from a Record within an Option (ie. Union of the type [null, Record] or [Record, null])
within a Record are moved to parent Record as a list of Option fields
Record R1 {
fields : {[
{
Union : [
null,
Record R2 {
fields : {[
{
String S1
}, {
String S2
}
]}
}
}
]}
}
will be flattened to:
Record R1 {
fields : {[
{
Union : [ null, String S1]
}, {
Union : [ null, String S2]
}
]}
}
3. Array or Map will not be un-nested, however Records within it will be un-nested as described above
4. All un-nested fields are decorated with a new property "flatten_source" which is a dot separated string
concatenation of parent fields name, similarly un-nested fields are renamed to double-underscore string
concatenation of parent fields name
5. Primitive Types are not un-nested