au.id.jericho.lib.html

Class NumericCharacterReference

Implemented Interfaces:
CharSequence, Comparable

public class NumericCharacterReference
extends CharacterReference

Represents an HTML Numeric Character Reference.

A numeric character reference can be one of two types:

Decimal Character Reference

x#>
Hexadecimal Character Reference

x#>

Static methods to encode and decode strings and single characters can be found in the CharacterReference superclass.

NumericCharacterReference instances are obtained using one of the following methods:

See Also:
CharacterReference, CharacterEntityReference

Field Summary

Fields inherited from class au.id.jericho.lib.html.CharacterReference

INVALID_CODE_POINT

Method Summary

static String
encode(CharSequence unencodedText)
Encodes the specified text, escaping special characters into numeric character references.
static String
encodeDecimal(CharSequence unencodedText)
Encodes the specified text, escaping special characters into decimal character references.
static String
encodeHexadecimal(CharSequence unencodedText)
Encodes the specified text, escaping special characters into hexadecimal character references.
String
getCharacterReferenceString()
Returns the correct encoded form of this numeric character reference.
static String
getCharacterReferenceString(int codePoint)
Returns the numeric character reference encoded form of the specified unicode code point.
String
getDebugInfo()
Returns a string representation of this object useful for debugging purposes.
boolean
isDecimal()
Indicates whether this numeric character reference specifies the unicode code point in decimal format.
boolean
isHexadecimal()
Indicates whether this numeric character reference specifies the unicode code point in hexadecimal format.

Methods inherited from class au.id.jericho.lib.html.CharacterReference

decode, decode, decodeCollapseWhiteSpace, encode, encodeWithWhiteSpaceFormatting, getChar, getCharacterReferenceString, getCharacterReferenceString, getCodePoint, getCodePointFromCharacterReferenceString, getDecimalCharacterReferenceString, getDecimalCharacterReferenceString, getHexadecimalCharacterReferenceString, getHexadecimalCharacterReferenceString, getUnicodeText, getUnicodeText, isTerminated, parse, reencode, requiresEncoding

Methods inherited from class au.id.jericho.lib.html.Segment

charAt, compareTo, encloses, encloses, equals, extractText, extractText, findAllCharacterReferences, findAllComments, findAllElements, findAllElements, findAllElements, findAllStartTags, findAllStartTags, findAllStartTags, findAllTags, findAllTags, findFormControls, findFormFields, findWords, getBegin, getChildElements, getDebugInfo, getEnd, getSourceText, getSourceTextNoWhitespace, hashCode, ignoreWhenParsing, isComment, isWhiteSpace, isWhiteSpace, length, parseAttributes, subSequence, toString

Method Details

encode

public static String encode(CharSequence unencodedText)
Encodes the specified text, escaping special characters into numeric character references.

Each character is encoded only if the requiresEncoding(char) method would return true for that character.

This method encodes all character references in decimal format, and is exactly the same as calling encodeDecimal(CharSequence).

To encode text using both character entity references and numeric character references, use the
CharacterReference.encode(CharSequence) method instead.

To encode text using hexadecimal character references only, use the encodeHexadecimal(CharSequence) method instead.

Overrides:
encode in interface CharacterReference
Parameters:
unencodedText - the text to encode.
Returns:
the encoded string.

encodeDecimal

public static String encodeDecimal(CharSequence unencodedText)
Encodes the specified text, escaping special characters into decimal character references.

Each character is encoded only if the requiresEncoding(char) method would return true for that character.

To encode text using both character entity references and numeric character references, use the
CharacterReference.encode(CharSequence) method instead.

To encode text using hexadecimal character references only, use the encodeHexadecimal(CharSequence) method instead.

Parameters:
unencodedText - the text to encode.
Returns:
the encoded string.

encodeHexadecimal

public static String encodeHexadecimal(CharSequence unencodedText)
Encodes the specified text, escaping special characters into hexadecimal character references.

Each character is encoded only if the requiresEncoding(char) method would return true for that character.

To encode text using both character entity references and numeric character references, use the
CharacterReference.encode(CharSequence) method instead.

To encode text using decimal character references only, use the encodeDecimal(CharSequence) method instead.

Parameters:
unencodedText - the text to encode.
Returns:
the encoded string.

getCharacterReferenceString

public String getCharacterReferenceString()
Returns the correct encoded form of this numeric character reference.

The returned string uses the same radix as the original character reference in the source document, i.e. decimal format if isDecimal() is true, and hexadecimal format if isHexadecimal() is true.

Note that the returned string is not necessarily the same as the original source text used to create this object. This library recognises certain invalid forms of character references, as detailed in the decode(CharSequence) method.

To retrieve the original source text, use the toString() method instead.

CharacterReference.parse("&#62").getCharacterReferenceString()>
Overrides:
getCharacterReferenceString in interface CharacterReference
Returns:
the correct encoded form of this numeric character reference.
See Also:
CharacterReference.getCharacterReferenceString(int codePoint)

getCharacterReferenceString

public static String getCharacterReferenceString(int codePoint)
Returns the numeric character reference encoded form of the specified unicode code point.

This method returns the character reference in decimal format, and is exactly the same as calling getDecimalCharacterReferenceString(int codePoint).

To get either the character entity reference or numeric character reference, use the
CharacterReference.getCharacterReferenceString(int codePoint) method instead.

To get the character reference in hexadecimal format, use the getHexadecimalCharacterReferenceString(int codePoint) method instead.

NumericCharacterReference.getCharacterReferenceString(62)>
NumericCharacterReference.getCharacterReferenceString('>')>
Overrides:
getCharacterReferenceString in interface CharacterReference
Returns:
the numeric character reference encoded form of the specified unicode code point.
See Also:
CharacterReference.getCharacterReferenceString(int codePoint)

getDebugInfo

public String getDebugInfo()
Returns a string representation of this object useful for debugging purposes.
Overrides:
getDebugInfo in interface Segment
Returns:
a string representation of this object useful for debugging purposes.

isDecimal

public boolean isDecimal()
Returns:
true if this numeric character reference specifies the unicode code point in decimal format, otherwise false.

isHexadecimal

public boolean isHexadecimal()
Returns:
true if this numeric character reference specifies the unicode code point in hexadecimal format, otherwise false.
See Also:
isDecimal()