Class UCharacterProperty
Internal class used for Unicode character property database.
This classes store binary data read from uprops.icu. It does not have the capability to parse the data into more high-level information. It only returns bytes of information when required.
Due to the form most commonly used for retrieval, array of char is used to store the binary data.
UCharacterPropertyDB also contains information on accessing indexes to significant points in the binary data.
Responsibility for molding the binary data into more meaning form lies on UCharacter.
- Since:
- release 2.1, february 1st 2002
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final UCharacterPropertystatic final charLatin capital letter i with dot abovestatic final charLatin small letter i with dot abovestatic final charLatin lowercase ichar[]Script_Extensions dataTrie dataUnicode versionstatic final intstatic final intstatic final intstatic final intInteger properties mask and shift values for scripts.static final intScript_Extensions: mask includes Scriptstatic final intstatic final intstatic final intstatic final intFrom ubidi_props.c/ubidi.icustatic final intFrom ucase.c/ucase.icustatic final intFrom ucase.c/ucase.icu as well as unorm.cpp/unorm.icustatic final intFrom uchar.c/uprops.icu main triestatic final intFrom uchar.c/uprops.icu main trie as well as properties vectors triestatic final intOne more than the highest UPropertySource (SRC_) constant.static final intstatic final intstatic final intstatic final intstatic final intstatic final intFrom unames.c/unames.icustatic final intFrom normalizer2impl.cpp/nfc.nrmstatic final intFrom normalizer2impl.cpp/nfc.nrm canonical iterator datastatic final intFrom normalizer2impl.cpp/nfkc.nrmstatic final intFrom normalizer2impl.cpp/nfkc_cf.nrmstatic final intNo source, not a supported property.static final intFrom uchar.c/uprops.icu properties vectors triestatic final intstatic final intCharacter type mask -
Method Summary
Modifier and TypeMethodDescriptionintdigit(int c) intgetAdditional(int codepoint, int column) Gets the unicode additional properties.getAge(int codepoint) Get the "age" of the code point.static intgetEuropeanDigit(int ch) Returns the digit values of characters like 'A' - 'Z', normal, half-width and full-width.intgetIntPropertyMaxValue(int which) intgetIntPropertyValue(int c, int which) static final intgetMask(int type) Gets the type maskintgetMaxValues(int column) Get the the maximum values for some enum/int properties.intgetNumericValue(int c) final intgetProperty(int ch) Gets the main property value for code point ch.intgetType(int c) doublegetUnicodeNumericValue(int c) booleanhasBinaryProperty(int c, int which) static final intmergeScriptCodeOrIndex(int scriptX) void
-
Field Details
-
INSTANCE
-
m_trie_
Trie data -
m_unicodeVersion_
Unicode version -
LATIN_CAPITAL_LETTER_I_WITH_DOT_ABOVE_
public static final char LATIN_CAPITAL_LETTER_I_WITH_DOT_ABOVE_Latin capital letter i with dot above- See Also:
-
LATIN_SMALL_LETTER_DOTLESS_I_
public static final char LATIN_SMALL_LETTER_DOTLESS_I_Latin small letter i with dot above- See Also:
-
LATIN_SMALL_LETTER_I_
public static final char LATIN_SMALL_LETTER_I_Latin lowercase i- See Also:
-
TYPE_MASK
public static final int TYPE_MASKCharacter type mask- See Also:
-
SRC_NONE
public static final int SRC_NONENo source, not a supported property.- See Also:
-
SRC_CHAR
public static final int SRC_CHARFrom uchar.c/uprops.icu main trie- See Also:
-
SRC_PROPSVEC
public static final int SRC_PROPSVECFrom uchar.c/uprops.icu properties vectors trie- See Also:
-
SRC_NAMES
public static final int SRC_NAMESFrom unames.c/unames.icu- See Also:
-
SRC_CASE
public static final int SRC_CASEFrom ucase.c/ucase.icu- See Also:
-
SRC_BIDI
public static final int SRC_BIDIFrom ubidi_props.c/ubidi.icu- See Also:
-
SRC_CHAR_AND_PROPSVEC
public static final int SRC_CHAR_AND_PROPSVECFrom uchar.c/uprops.icu main trie as well as properties vectors trie- See Also:
-
SRC_CASE_AND_NORM
public static final int SRC_CASE_AND_NORMFrom ucase.c/ucase.icu as well as unorm.cpp/unorm.icu- See Also:
-
SRC_NFC
public static final int SRC_NFCFrom normalizer2impl.cpp/nfc.nrm- See Also:
-
SRC_NFKC
public static final int SRC_NFKCFrom normalizer2impl.cpp/nfkc.nrm- See Also:
-
SRC_NFKC_CF
public static final int SRC_NFKC_CFFrom normalizer2impl.cpp/nfkc_cf.nrm- See Also:
-
SRC_NFC_CANON_ITER
public static final int SRC_NFC_CANON_ITERFrom normalizer2impl.cpp/nfc.nrm canonical iterator data- See Also:
-
SRC_INPC
public static final int SRC_INPC- See Also:
-
SRC_INSC
public static final int SRC_INSC- See Also:
-
SRC_VO
public static final int SRC_VO- See Also:
-
SRC_EMOJI
public static final int SRC_EMOJI- See Also:
-
SRC_IDSU
public static final int SRC_IDSU- See Also:
-
SRC_ID_COMPAT_MATH
public static final int SRC_ID_COMPAT_MATH- See Also:
-
SRC_COUNT
public static final int SRC_COUNTOne more than the highest UPropertySource (SRC_) constant.- See Also:
-
m_scriptExtensions_
public char[] m_scriptExtensions_Script_Extensions data -
SCRIPT_X_MASK
public static final int SCRIPT_X_MASKScript_Extensions: mask includes Script- See Also:
-
SCRIPT_HIGH_MASK
public static final int SCRIPT_HIGH_MASK- See Also:
-
SCRIPT_HIGH_SHIFT
public static final int SCRIPT_HIGH_SHIFT- See Also:
-
MAX_SCRIPT
public static final int MAX_SCRIPT- See Also:
-
SCRIPT_LOW_MASK
public static final int SCRIPT_LOW_MASKInteger properties mask and shift values for scripts. Equivalent to icu4c UPROPS_SHIFT_LOW_MASK.- See Also:
-
SCRIPT_X_WITH_COMMON
public static final int SCRIPT_X_WITH_COMMON- See Also:
-
SCRIPT_X_WITH_INHERITED
public static final int SCRIPT_X_WITH_INHERITED- See Also:
-
SCRIPT_X_WITH_OTHER
public static final int SCRIPT_X_WITH_OTHER- See Also:
-
-
Method Details
-
getProperty
public final int getProperty(int ch) Gets the main property value for code point ch.- Parameters:
ch- code point whose property value is to be retrieved- Returns:
- property value of code point
-
getAdditional
public int getAdditional(int codepoint, int column) Gets the unicode additional properties. Java version of C u_getUnicodeProperties().- Parameters:
codepoint- codepoint whose additional properties is to be retrievedcolumn- The column index.- Returns:
- unicode properties
-
getAge
Get the "age" of the code point.
The "age" is the Unicode version when the code point was first designated (as a non-character or for Private Use) or assigned a character.
This can be useful to avoid emitting code points to receiving processes that do not accept newer characters.
The data is from the UCD file DerivedAge.txt.
This API does not check the validity of the codepoint.
- Parameters:
codepoint- The code point.- Returns:
- the Unicode version number
-
hasBinaryProperty
public boolean hasBinaryProperty(int c, int which) -
getType
public int getType(int c) -
getIntPropertyValue
public int getIntPropertyValue(int c, int which) -
getIntPropertyMaxValue
public int getIntPropertyMaxValue(int which) -
getMaxValues
public int getMaxValues(int column) Get the the maximum values for some enum/int properties.- Returns:
- maximum values for the integer properties.
-
getMask
public static final int getMask(int type) Gets the type mask- Parameters:
type- character type- Returns:
- mask
-
getEuropeanDigit
public static int getEuropeanDigit(int ch) Returns the digit values of characters like 'A' - 'Z', normal, half-width and full-width. This method assumes that the other digit characters are checked by the calling method.- Parameters:
ch- character to test- Returns:
- -1 if ch is not a character of the form 'A' - 'Z', otherwise its corresponding digit will be returned.
-
digit
public int digit(int c) -
getNumericValue
public int getNumericValue(int c) -
getUnicodeNumericValue
public double getUnicodeNumericValue(int c) -
mergeScriptCodeOrIndex
public static final int mergeScriptCodeOrIndex(int scriptX) -
addPropertyStarts
-
upropsvec_addPropertyStarts
-