public class WriterUtils extends Object
DataWriter class.| Modifier and Type | Class and Description |
|---|---|
static class |
WriterUtils.WriterFilePathType |
| Modifier and Type | Field and Description |
|---|---|
static com.typesafe.config.Config |
NO_RETRY_CONFIG |
static String |
WRITER_ENCRYPTED_CONFIG_PATH |
| Constructor and Description |
|---|
WriterUtils() |
| Modifier and Type | Method and Description |
|---|---|
static org.apache.avro.file.CodecFactory |
getCodecFactory(com.google.common.base.Optional<String> codecName,
com.google.common.base.Optional<String> deflateLevel)
Creates a
CodecFactory based on the specified codec name and deflate level. |
static org.apache.hadoop.fs.Path |
getDataPublisherFinalDir(State state,
int numBranches,
int branchId)
Get the
Path corresponding the to the directory a given BaseDataPublisher should
commits its output data. |
static org.apache.hadoop.fs.Path |
getDefaultWriterFilePath(State state,
int numBranches,
int branchId)
Creates the default
Path for the ConfigurationKeys.WRITER_FILE_PATH key. |
static org.apache.hadoop.conf.Configuration |
getFsConfiguration(State state) |
static org.apache.hadoop.fs.Path |
getNamespaceTableWriterFilePath(State state)
Creates
Path for case WriterUtils.WriterFilePathType.NAMESPACE_TABLE with configurations
ConfigurationKeys.EXTRACT_NAMESPACE_NAME_KEY and ConfigurationKeys.EXTRACT_TABLE_NAME_KEY |
static org.apache.hadoop.fs.Path |
getTableNameWriterFilePath(State state)
Creates
Path for the ConfigurationKeys.WRITER_FILE_PATH key according to
ConfigurationKeys.EXTRACT_TABLE_NAME_KEY. |
static String |
getWriterFileName(State state,
int numBranches,
int branchId,
String writerId,
String formatExtension)
Get the value of
ConfigurationKeys.WRITER_FILE_NAME for the a given DataWriter. |
static org.apache.hadoop.fs.Path |
getWriterFilePath(State state,
int numBranches,
int branchId)
Get the
Path corresponding the the relative file path for a given DataWriter. |
static org.apache.hadoop.fs.FileSystem |
getWriterFs(State state) |
static org.apache.hadoop.fs.FileSystem |
getWriterFS(State state,
int numBranches,
int branchId) |
static URI |
getWriterFsUri(State state,
int numBranches,
int branchId) |
static org.apache.hadoop.fs.Path |
getWriterOutputDir(State state,
int numBranches,
int branchId)
Get the
Path corresponding the to the directory a given DataWriter should be writing
its output data. |
static org.apache.hadoop.fs.Path |
getWriterStagingDir(State state,
int numBranches,
int branchId)
Get the
Path corresponding the to the directory a given DataWriter should be writing
its staging data. |
static org.apache.hadoop.fs.Path |
getWriterStagingDir(State state,
int numBranches,
int branchId,
String attemptId)
Get the staging
Path for DataWriter that has attemptId in the path. |
static void |
mkdirsWithRecursivePermission(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path,
org.apache.hadoop.fs.permission.FsPermission perm)
Create the given dir as well as all missing ancestor dirs.
|
static void |
mkdirsWithRecursivePermissionWithRetry(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path,
org.apache.hadoop.fs.permission.FsPermission perm,
com.typesafe.config.Config retrierConfig) |
public static final String WRITER_ENCRYPTED_CONFIG_PATH
public static final com.typesafe.config.Config NO_RETRY_CONFIG
public static org.apache.hadoop.fs.Path getWriterStagingDir(State state, int numBranches, int branchId)
Path corresponding the to the directory a given DataWriter should be writing
its staging data. The staging data directory is determined by combining the
ConfigurationKeys.WRITER_STAGING_DIR and the ConfigurationKeys.WRITER_FILE_PATH.state - is the State corresponding to a specific DataWriter.numBranches - is the total number of branches for the given State.branchId - is the id for the specific branch that the DataWriter will write to.Path specifying the directory where the DataWriter will write to.public static org.apache.hadoop.fs.Path getWriterStagingDir(State state, int numBranches, int branchId, String attemptId)
Path for DataWriter that has attemptId in the path.public static org.apache.hadoop.fs.Path getWriterOutputDir(State state, int numBranches, int branchId)
Path corresponding the to the directory a given DataWriter should be writing
its output data. The output data directory is determined by combining the
ConfigurationKeys.WRITER_OUTPUT_DIR and the ConfigurationKeys.WRITER_FILE_PATH.state - is the State corresponding to a specific DataWriter.numBranches - is the total number of branches for the given State.branchId - is the id for the specific branch that the DataWriter will write to.Path specifying the directory where the DataWriter will write to.public static org.apache.hadoop.fs.Path getDataPublisherFinalDir(State state, int numBranches, int branchId)
Path corresponding the to the directory a given BaseDataPublisher should
commits its output data. The final output data directory is determined by combining the
ConfigurationKeys.DATA_PUBLISHER_FINAL_DIR and the ConfigurationKeys.WRITER_FILE_PATH.state - is the State corresponding to a specific DataWriter.numBranches - is the total number of branches for the given State.branchId - is the id for the specific branch that the BaseDataPublisher will publish.Path specifying the directory where the BaseDataPublisher will publish.public static org.apache.hadoop.fs.Path getWriterFilePath(State state, int numBranches, int branchId)
Path corresponding the the relative file path for a given DataWriter.
This method retrieves the value of ConfigurationKeys.WRITER_FILE_PATH from the given State. It also
constructs the default value of the ConfigurationKeys.WRITER_FILE_PATH if not is not specified in the given
State.state - is the State corresponding to a specific DataWriter.numBranches - is the total number of branches for the given State.branchId - is the id for the specific branch that the {DataWriter will write to.Path specifying the relative directory where the DataWriter will write to.public static org.apache.hadoop.fs.Path getNamespaceTableWriterFilePath(State state)
Path for case WriterUtils.WriterFilePathType.NAMESPACE_TABLE with configurations
ConfigurationKeys.EXTRACT_NAMESPACE_NAME_KEY and ConfigurationKeys.EXTRACT_TABLE_NAME_KEYstate - public static org.apache.hadoop.fs.Path getTableNameWriterFilePath(State state)
Path for the ConfigurationKeys.WRITER_FILE_PATH key according to
ConfigurationKeys.EXTRACT_TABLE_NAME_KEY.state - public static org.apache.hadoop.fs.Path getDefaultWriterFilePath(State state, int numBranches, int branchId)
Path for the ConfigurationKeys.WRITER_FILE_PATH key.numBranches - is the total number of branches for the given State.branchId - is the id for the specific branch that the DataWriter will write to.Path specifying the directory where the DataWriter will write to.public static String getWriterFileName(State state, int numBranches, int branchId, String writerId, String formatExtension)
ConfigurationKeys.WRITER_FILE_NAME for the a given DataWriter. The
method also constructs the default value of the ConfigurationKeys.WRITER_FILE_NAME if it is not set in the
Statestate - is the State corresponding to a specific DataWriter.numBranches - is the total number of branches for the given State.branchId - is the id for the specific branch that the {DataWriter will write to.writerId - is the id for a specific DataWriter.formatExtension - is the format extension for the file (e.g. ".avro").String representation of the file name.public static org.apache.avro.file.CodecFactory getCodecFactory(com.google.common.base.Optional<String> codecName, com.google.common.base.Optional<String> deflateLevel)
CodecFactory based on the specified codec name and deflate level. If codecName is absent, then
a CodecFactory.deflateCodec(int) is returned. Otherwise the codecName is converted into a
CodecFactory via the CodecFactory.fromString(String) method.codecName - the name of the codec to use (e.g. deflate, snappy, xz, etc.).deflateLevel - must be an integer from [0-9], and is only applicable if the codecName is "deflate".CodecFactory.public static void mkdirsWithRecursivePermission(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path,
org.apache.hadoop.fs.permission.FsPermission perm)
throws IOException
FileSystem.mkdirs(Path, FsPermission), since that method only sets
the permission for the given dir, and not recursively for the ancestor dirs.fs - FileSystempath - The dir to be createdperm - The permission to be setIOException - if failing to create dir or set permission.public static void mkdirsWithRecursivePermissionWithRetry(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path,
org.apache.hadoop.fs.permission.FsPermission perm,
com.typesafe.config.Config retrierConfig)
throws IOException
IOExceptionpublic static org.apache.hadoop.fs.FileSystem getWriterFS(State state, int numBranches, int branchId) throws IOException
IOExceptionpublic static org.apache.hadoop.fs.FileSystem getWriterFs(State state) throws IOException
IOExceptionpublic static org.apache.hadoop.conf.Configuration getFsConfiguration(State state)