T - type of objectpublic interface HoodieData<T> extends Serializable
T
allowing to perform common transformation on it.
This abstraction provides common API implemented by
HoodieListData, HoodieListPairData), where all objects
are held in-memory by the executing processHoodieJavaRDD, etc)map, filter, etc)| Modifier and Type | Interface and Description |
|---|---|
static class |
HoodieData.HoodieDataCacheKey
The key used in a caching map to identify a
HoodieData. |
| Modifier and Type | Method and Description |
|---|---|
List<T> |
collectAsList()
Collects results of the underlying collection into a
List
This is a terminal operation |
long |
count()
Returns number of objects held in the collection
|
HoodieData<T> |
distinct()
Returns new
HoodieData collection holding only distinct objects of the original one
This is a stateful intermediate operation |
HoodieData<T> |
distinct(int parallelism)
Returns new
HoodieData collection holding only distinct objects of the original one
This is a stateful intermediate operation |
default <O> HoodieData<T> |
distinctWithKey(SerializableFunction<T,O> keyGetter,
int parallelism) |
HoodieData<T> |
filter(SerializableFunction<T,Boolean> filterFunc)
Returns new instance of
HoodieData collection only containing elements matching provided
filterFunc (ie ones it returns true on) |
<O> HoodieData<O> |
flatMap(SerializableFunction<T,Iterator<O>> func)
Maps every element in the collection into a collection of the new elements using provided
mapping
func, subsequently flattening the result (by concatenating) into a single
collection
This is an intermediate operation |
<K,V> HoodiePairData<K,V> |
flatMapToPair(SerializableFunction<T,Iterator<? extends Pair<K,V>>> func)
Maps every element in the collection into a collection of the
Pairs of new elements
using provided mapping func, subsequently flattening the result (by concatenating) into
a single collection
NOTE: That this operation will convert container from HoodieData to HoodiePairData
This is an intermediate operation |
int |
getId()
Get the
HoodieData's unique non-negative identifier. |
int |
getNumPartitions() |
boolean |
isEmpty()
Returns whether the collection is empty.
|
<O> HoodieData<O> |
map(SerializableFunction<T,O> func)
Maps every element in the collection using provided mapping
func. |
<O> HoodieData<O> |
mapPartitions(SerializableFunction<Iterator<T>,Iterator<O>> func,
boolean preservesPartitioning)
Maps every element in the collection's partition (if applicable) by applying provided
mapping
func to every collection's partition
This is an intermediate operation |
<K,V> HoodiePairData<K,V> |
mapToPair(SerializablePairFunction<T,K,V> func)
Maps every element in the collection using provided mapping
func into a Pair
of elements K and V |
void |
persist(String level)
Persists the data w/ provided
level (if applicable). |
void |
persist(String level,
HoodieEngineContext engineContext,
HoodieData.HoodieDataCacheKey cacheKey)
Persists the data w/ provided
level (if applicable), and cache the data's ids within the engineContext. |
HoodieData<T> |
repartition(int parallelism)
Re-partitions underlying collection (if applicable) making sure new
HoodieData has
exactly parallelism partitions |
HoodieData<T> |
union(HoodieData<T> other)
Unions
HoodieData with another instance of HoodieData. |
void |
unpersist()
Un-persists the data (if previously persisted)
|
int getId()
HoodieData's unique non-negative identifier. -1 indicates invalid id.void persist(String level)
level (if applicable).
Use this method only when you call unpersist() at some later point for the same HoodieData.
Otherwise, use persist(String, HoodieEngineContext, HoodieDataCacheKey) instead for auto-unpersist
at the end of a client write operation.void persist(String level, HoodieEngineContext engineContext, HoodieData.HoodieDataCacheKey cacheKey)
level (if applicable), and cache the data's ids within the engineContext.void unpersist()
boolean isEmpty()
long count()
NOTE: This is a terminal operation
int getNumPartitions()
<O> HoodieData<O> map(SerializableFunction<T,O> func)
func.
This is an intermediate operation
O - output object typefunc - serializable map functionHoodieData holding mapped elements<O> HoodieData<O> mapPartitions(SerializableFunction<Iterator<T>,Iterator<O>> func, boolean preservesPartitioning)
func to every collection's partition
This is an intermediate operationO - output object typefunc - serializable map function accepting Iterator of a single
partition's elements and returning a new Iterator mapping
every element of the partition into a new onepreservesPartitioning - whether to preserve partitioning in the resulting collectionHoodieData holding mapped elements<O> HoodieData<O> flatMap(SerializableFunction<T,Iterator<O>> func)
func, subsequently flattening the result (by concatenating) into a single
collection
This is an intermediate operationO - output object typefunc - serializable function mapping every element T into Iterator<O>HoodieData holding mapped elements<K,V> HoodiePairData<K,V> flatMapToPair(SerializableFunction<T,Iterator<? extends Pair<K,V>>> func)
Pairs of new elements
using provided mapping func, subsequently flattening the result (by concatenating) into
a single collection
NOTE: That this operation will convert container from HoodieData to HoodiePairData
This is an intermediate operation<K,V> HoodiePairData<K,V> mapToPair(SerializablePairFunction<T,K,V> func)
func into a Pair
of elements K and V
This is an intermediate operation
K - key type of the pairV - value type of the pairfunc - serializable map functionHoodiePairData holding mapped elementsHoodieData<T> distinct()
HoodieData collection holding only distinct objects of the original one
This is a stateful intermediate operationHoodieData<T> distinct(int parallelism)
HoodieData collection holding only distinct objects of the original one
This is a stateful intermediate operationHoodieData<T> filter(SerializableFunction<T,Boolean> filterFunc)
HoodieData collection only containing elements matching provided
filterFunc (ie ones it returns true on)filterFunc - filtering func either accepting or rejecting the elementsHoodieData holding filtered elementsHoodieData<T> union(HoodieData<T> other)
HoodieData with another instance of HoodieData.
Note that, it's only able to union same underlying collection implementations.
This is a stateful intermediate operationother - HoodieData collectionHoodieData holding superset of elements of this and other collectionsList<T> collectAsList()
List
This is a terminal operationHoodieData<T> repartition(int parallelism)
HoodieData has
exactly parallelism partitionsparallelism - target number of partitions in the underlying collectionHoodieData holding re-partitioned collectiondefault <O> HoodieData<T> distinctWithKey(SerializableFunction<T,O> keyGetter, int parallelism)
Copyright © 2024 The Apache Software Foundation. All rights reserved.