Class UnicodeRegex
java.lang.Object
org.graalvm.shadowed.com.ibm.icu.impl.UnicodeRegex
- All Implemented Interfaces:
Cloneable,StringTransform,Transform<String,,String> Freezable<UnicodeRegex>
public class UnicodeRegex
extends Object
implements Cloneable, Freezable<UnicodeRegex>, StringTransform
Contains utilities to supplement the JDK Regex, since it doesn't handle
Unicode well.
TODO: Move to org.graalvm.shadowed.com.ibm.icu.dev.somewhere. 2015-sep-03: This is used there, and also in CLDR and in UnicodeTools.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionappendLines(List<String> result, InputStream inputStream, String encoding) Utility for loading lines from a UTF8 file.appendLines(List<String> result, String file, String encoding) Utility for loading lines from a file.Provides for the clone operation.static PatternCompile a regex string, after processing by fix(...).static PatternCompile a regex string, after processing by fix(...).compileBnf(String bnfLines) Compile a composed string from a set of BNF lines; see the List version for more information.compileBnf(List<String> lines) Compile a composed string from a set of BNF lines, such as for composing a regex expression.static StringConvenience static function, using standard parameters.freeze()Freezes the object.Set the symbol table for internal processingbooleanisFrozen()Determines whether the object has been frozen or not.voidsetBnfCommentString(String bnfCommentString) voidsetBnfLineSeparator(String bnfLineSeparator) voidsetBnfVariableInfix(String bnfVariableInfix) setSymbolTable(SymbolTable symbolTable) Get the symbol table for internal processingAdds full Unicode property support, with the latest version of Unicode, to Java Regex, bringing it up to Level 1 (see http://www.unicode.org/reports/tr18/).
-
Constructor Details
-
UnicodeRegex
public UnicodeRegex()
-
-
Method Details
-
getSymbolTable
Set the symbol table for internal processing -
setSymbolTable
Get the symbol table for internal processing -
transform
Adds full Unicode property support, with the latest version of Unicode, to Java Regex, bringing it up to Level 1 (see http://www.unicode.org/reports/tr18/). It does this by preprocessing the regex pattern string and interpreting the character classes (\p{...}, \P{...}, [...]) according to their syntax and meaning in UnicodeSet. With this utility, Java regex expressions can be updated to work with the latest version of Unicode, and with all Unicode properties. Note that the UnicodeSet syntax has not yet, however, been updated to be completely consistent with Java regex, so be careful of the differences.Not thread-safe; create a separate copy for different threads.
In the future, we may extend this to support other regex packages.
- Specified by:
transformin interfaceStringTransform- Specified by:
transformin interfaceTransform<String,String> - Parameters:
regex- A modified Java regex pattern, as in the input to Pattern.compile(), except that all "character classes" are processed as if they were UnicodeSet patterns. Example: "abc[:bc=N:]. See UnicodeSet for the differences in syntax.- Returns:
- A processed Java regex pattern, suitable for input to Pattern.compile().
-
fix
-
compile
-
compile
-
compileBnf
-
compileBnf
Compile a composed string from a set of BNF lines, such as for composing a regex expression. The lines can be in any order, but there must not be any cycles. The result can be used as input for fix().Example:
uri = (?: (scheme) \\:)? (host) (?: \\? (query))? (?: \\u0023 (fragment))?; scheme = reserved+; host = // reserved+; query = [\\=reserved]+; fragment = reserved+; reserved = [[:ascii:][:alphabetic:]];
Caveats: at this point the parsing is simple; for example, # cannot be quoted (use \\u0023); you can set it to null to disable. The equality sign and a few others can be reset with setBnfX().
- Parameters:
lines- Series of lines that represent a BNF expression. The lines contain a series of statements that of the form x=y;. A statement can take multiple lines, but there can't be multiple statements on a line. A hash quotes to the end of the line.- Returns:
- Pattern
-
getBnfCommentString
-
setBnfCommentString
-
getBnfVariableInfix
-
setBnfVariableInfix
-
getBnfLineSeparator
-
setBnfLineSeparator
-
appendLines
public static List<String> appendLines(List<String> result, String file, String encoding) throws IOException Utility for loading lines from a file.- Parameters:
result- The result of the appended lines.file- The file to have an input stream.encoding- if null, then UTF-8- Returns:
- filled list
- Throws:
IOException- If there were problems opening the file for input stream.
-
appendLines
public static List<String> appendLines(List<String> result, InputStream inputStream, String encoding) throws UnsupportedEncodingException, IOException Utility for loading lines from a UTF8 file.- Parameters:
result- The result of the appended lines.inputStream- The input stream.encoding- if null, then UTF-8- Returns:
- filled list
- Throws:
IOException- If there were problems opening the input stream for reading.UnsupportedEncodingException
-
cloneAsThawed
Description copied from interface:FreezableProvides for the clone operation. Any clone is initially unfrozen.- Specified by:
cloneAsThawedin interfaceFreezable<UnicodeRegex>
-
freeze
Description copied from interface:FreezableFreezes the object.- Specified by:
freezein interfaceFreezable<UnicodeRegex>- Returns:
- the object itself.
-
isFrozen
public boolean isFrozen()Description copied from interface:FreezableDetermines whether the object has been frozen or not.- Specified by:
isFrozenin interfaceFreezable<UnicodeRegex>
-