au.id.jericho.lib.html

Class Config.CompatibilityMode

Enclosing Class:
Config

public static final class Config.CompatibilityMode
extends java.lang.Object

Represents a set of configuration parameters that relate to user agent compatibility issues.

The predefined compatibility modes IE, MOZILLA, OPERA and XHTML provide an easy means of ensuring the library interprets the markup in a way consistent with some of the most commonly used browsers, at least in relation to the behaviour described by the properties in this class.

The properties of any CompatibilityMode object can be modified individually, including those in the predefined instances as well as newly constructed instances. Take note however that modifying the properties of the predefined instances has a global affect.

The currently active compatibility mode is stored in the static Config.CurrentCompatibilityMode property.

Field Summary

static int
CODE_POINTS_ALL
Indicates the recognition of all unicode code points.
static int
CODE_POINTS_NONE
Indicates the recognition of no unicode code points.
static Config.CompatibilityMode
IE
Microsoft Internet Explorer compatibility mode.
static Config.CompatibilityMode
MOZILLA
Mozilla / Firefox / Netscape compatibility mode.
static Config.CompatibilityMode
OPERA
Opera compatibility mode.
static Config.CompatibilityMode
XHTML
XHTML compatibility mode.

Constructor Summary

CompatibilityMode(String name)
Constructs a new CompatibilityMode with the given name.

Method Summary

String
getDebugInfo()
Returns a string representation of this object useful for debugging purposes.
String
getName()
Returns the name of this compatibility mode.
int
getUnterminatedCharacterEntityReferenceMaxCodePoint(boolean insideAttributeValue)
Returns the maximum unicode code point of an unterminated character entity reference which is to be recognised in the specified context.
int
getUnterminatedDecimalCharacterReferenceMaxCodePoint(boolean insideAttributeValue)
Returns the maximum unicode code point of an unterminated decimal character reference which is to be recognised in the specified context.
int
getUnterminatedHexadecimalCharacterReferenceMaxCodePoint(boolean insideAttributeValue)
Returns the maximum unicode code point of an unterminated hexadecimal character reference which is to be recognised in the specified context.
boolean
isFormFieldNameCaseInsensitive()
Indicates whether form field names are treated as case insensitive.
void
setFormFieldNameCaseInsensitive(boolean value)
Sets whether form field names are treated as case insensitive.
void
setUnterminatedCharacterEntityReferenceMaxCodePoint(boolean insideAttributeValue, int maxCodePoint)
Sets the maximum unicode code point of an unterminated character entity reference which is to be recognised in the specified context.
void
setUnterminatedDecimalCharacterReferenceMaxCodePoint(boolean insideAttributeValue, int maxCodePoint)
Sets the maximum unicode code point of an unterminated decimal character reference which is to be recognised in the specified context.
void
setUnterminatedHexadecimalCharacterReferenceMaxCodePoint(boolean insideAttributeValue, int maxCodePoint)
Sets the maximum unicode code point of an unterminated headecimal character reference which is to be recognised in the specified context.
String
toString()
Returns the name of this compatibility mode.

Field Details

CODE_POINTS_ALL

public static final int CODE_POINTS_ALL
Indicates the recognition of all unicode code points.

This value is used in properties which specify a maximum unicode code point to be recognised by the parser.

Field Value:
1114111
See Also:
getUnterminatedCharacterEntityReferenceMaxCodePoint(boolean insideAttributeValue), getUnterminatedDecimalCharacterReferenceMaxCodePoint(boolean insideAttributeValue), getUnterminatedHexadecimalCharacterReferenceMaxCodePoint(boolean insideAttributeValue)

CODE_POINTS_NONE

public static final int CODE_POINTS_NONE
Indicates the recognition of no unicode code points.

This value is used in properties which specify a maximum unicode code point to be recognised by the parser.

Field Value:
-1
See Also:
getUnterminatedCharacterEntityReferenceMaxCodePoint(boolean insideAttributeValue), getUnterminatedDecimalCharacterReferenceMaxCodePoint(boolean insideAttributeValue), getUnterminatedHexadecimalCharacterReferenceMaxCodePoint(boolean insideAttributeValue)

IE

public static final Config.CompatibilityMode IE
Microsoft Internet Explorer compatibility mode.

Name = IE
FormFieldNameCaseInsensitive = true

Recognition of unterminated character references:  (inside attribute)    (outside attribute)  
UnterminatedCharacterEntityReferenceMaxCodePoint =U+00FFU+00FF
UnterminatedDecimalCharacterReferenceMaxCodePoint =AllAll
UnterminatedHexadecimalCharacterReferenceMaxCodePoint =AllNone

MOZILLA

public static final Config.CompatibilityMode MOZILLA
Mozilla / Firefox / Netscape compatibility mode.

Name = Mozilla
FormFieldNameCaseInsensitive = false

Recognition of unterminated character references:  (inside attribute)    (outside attribute)  
UnterminatedCharacterEntityReferenceMaxCodePoint =U+00FFAll
UnterminatedDecimalCharacterReferenceMaxCodePoint =AllAll
UnterminatedHexadecimalCharacterReferenceMaxCodePoint =AllAll

OPERA

public static final Config.CompatibilityMode OPERA
Opera compatibility mode.

Name = Opera
FormFieldNameCaseInsensitive = true

Recognition of unterminated character references:  (inside attribute)    (outside attribute)  
UnterminatedCharacterEntityReferenceMaxCodePoint =U+003EAll
UnterminatedDecimalCharacterReferenceMaxCodePoint =AllAll
UnterminatedHexadecimalCharacterReferenceMaxCodePoint =AllAll

XHTML

public static final Config.CompatibilityMode XHTML
XHTML compatibility mode.

Name = XHTML
FormFieldNameCaseInsensitive = false

Recognition of unterminated character references:  (inside attribute)    (outside attribute)  
UnterminatedCharacterEntityReferenceMaxCodePoint =NoneNone
UnterminatedDecimalCharacterReferenceMaxCodePoint =NoneNone
UnterminatedHexadecimalCharacterReferenceMaxCodePoint =NoneNone

Constructor Details

CompatibilityMode

public CompatibilityMode(String name)
Constructs a new CompatibilityMode with the given name.

All properties in the new instance are initially assigned their default values, which are the same as the strict rules of the XHTML compatibility mode.

Parameters:
name - the name of the new compatibility mode

Method Details

getDebugInfo

public String getDebugInfo()
Returns a string representation of this object useful for debugging purposes.
Returns:
a string representation of this object useful for debugging purposes.

getName

public String getName()
Returns the name of this compatibility mode.
Returns:
the name of this compatibility mode.

getUnterminatedCharacterEntityReferenceMaxCodePoint

public int getUnterminatedCharacterEntityReferenceMaxCodePoint(boolean insideAttributeValue)
Returns the maximum unicode code point of an unterminated character entity reference which is to be recognised in the specified context.

For example, if getUnterminatedCharacterEntityReferenceMaxCodePoint(true) has the value 0xFF (U+00FF) in the current compatibility mode, then:

  • CharacterReference.decode(CharSequence,boolean) CharacterReference.decode("&gt",true) returns ">".
    The string is recognised as the character entity reference CharacterEntityReference._gt > despite the fact that it is unterminated, because its unicode code point U+003E is below the maximum of U+00FF set by this property.
  • CharacterReference.decode(CharSequence,boolean) CharacterReference.decode("&euro",true) returns "&euro".
    The string is not recognised as the character entity reference CharacterEntityReference._euro € because it is unterminated and its unicode code point U+20AC is above the maximum of U+00FF set by this property.

See the documentation of the Attribute.getValue() method for further discussion.

Parameters:
insideAttributeValue - the context within an HTML document - true if inside an attribute value or false if outside an attribute value.
Returns:
the maximum unicode code point of an unterminated character entity reference which is to be recognised in the specified context.
See Also:
setUnterminatedCharacterEntityReferenceMaxCodePoint(boolean insideAttributeValue, int maxCodePoint)

getUnterminatedDecimalCharacterReferenceMaxCodePoint

public int getUnterminatedDecimalCharacterReferenceMaxCodePoint(boolean insideAttributeValue)
Returns the maximum unicode code point of an unterminated decimal character reference which is to be recognised in the specified context.

For example, if getUnterminatedDecimalCharacterReferenceMaxCodePoint(true) had the hypothetical value 0xFF (U+00FF) in the current compatibility mode, then:

  • CharacterReference.decode(CharSequence,boolean) CharacterReference.decode("&.62",true) returns ">".
    The string is recognised as the numeric character reference > despite the fact that it is unterminated, because its unicode code point U+003E is below the maximum of U+00FF set by this property.
  • CharacterReference.decode(CharSequence,boolean) CharacterReference.decode("&.8364",true) returns "&#8364".
    The string is not recognised as the numeric character reference € because it is unterminated and its unicode code point U+20AC is above the maximum of U+00FF set by this property.
Parameters:
insideAttributeValue - the context within an HTML document - true if inside an attribute value or false if outside an attribute value.
Returns:
the maximum unicode code point of an unterminated decimal character reference which is to be recognised in the specified context.
See Also:
setUnterminatedDecimalCharacterReferenceMaxCodePoint(boolean insideAttributeValue, int maxCodePoint)

getUnterminatedHexadecimalCharacterReferenceMaxCodePoint

public int getUnterminatedHexadecimalCharacterReferenceMaxCodePoint(boolean insideAttributeValue)
Returns the maximum unicode code point of an unterminated hexadecimal character reference which is to be recognised in the specified context.

For example, if getUnterminatedHexadecimalCharacterReferenceMaxCodePoint(true) had the hypothetical value 0xFF (U+00FF) in the current compatibility mode, then:

  • CharacterReference.decode(CharSequence,boolean) CharacterReference.decode("&.x3e",true) returns ">".
    The string is recognised as the numeric character reference > despite the fact that it is unterminated, because its unicode code point U+003E is below the maximum of U+00FF set by this property.
  • CharacterReference.decode(CharSequence,boolean) CharacterReference.decode("&.x20ac",true) returns "&#x20ac".
    The string is not recognised as the numeric character reference &#20ac; because it is unterminated and its unicode code point U+20AC is above the maximum of U+00FF set by this property.
Parameters:
insideAttributeValue - the context within an HTML document - true if inside an attribute value or false if outside an attribute value.
Returns:
the maximum unicode code point of an unterminated hexadecimal character reference which is to be recognised in the specified context.
See Also:
setUnterminatedHexadecimalCharacterReferenceMaxCodePoint(boolean insideAttributeValue, int maxCodePoint)

isFormFieldNameCaseInsensitive

public boolean isFormFieldNameCaseInsensitive()
Indicates whether form field names are treated as case insensitive.

Microsoft Internet Explorer treats field names as case insensitive, while Mozilla treats them as case sensitive.

The value of this property in the current compatibility mode affects all instances of the FormFields class. It should be set to the desired configuration before any instances of FormFields are created.

Returns:
true if form field names are treated as case insensitive, otherwise false.

setFormFieldNameCaseInsensitive

public void setFormFieldNameCaseInsensitive(boolean value)
Sets whether form field names are treated as case insensitive.

See isFormFieldNameCaseInsensitive() for the documentation of this property.

Parameters:
value - the new value of the property

setUnterminatedCharacterEntityReferenceMaxCodePoint

public void setUnterminatedCharacterEntityReferenceMaxCodePoint(boolean insideAttributeValue,
                                                                int maxCodePoint)
Sets the maximum unicode code point of an unterminated character entity reference which is to be recognised in the specified context.

See getUnterminatedCharacterEntityReferenceMaxCodePoint(boolean insideAttributeValue) for the documentation of this property.

Parameters:
insideAttributeValue - the context within an HTML document - true if inside an attribute value or false if outside an attribute value.
maxCodePoint - the maximum unicode code point.

setUnterminatedDecimalCharacterReferenceMaxCodePoint

public void setUnterminatedDecimalCharacterReferenceMaxCodePoint(boolean insideAttributeValue,
                                                                 int maxCodePoint)
Sets the maximum unicode code point of an unterminated decimal character reference which is to be recognised in the specified context.

See getUnterminatedDecimalCharacterReferenceMaxCodePoint(boolean insideAttributeValue) for the documentation of this property.

Parameters:
insideAttributeValue - the context within an HTML document - true if inside an attribute value or false if outside an attribute value.
maxCodePoint - the maximum unicode code point.

setUnterminatedHexadecimalCharacterReferenceMaxCodePoint

public void setUnterminatedHexadecimalCharacterReferenceMaxCodePoint(boolean insideAttributeValue,
                                                                     int maxCodePoint)
Sets the maximum unicode code point of an unterminated headecimal character reference which is to be recognised in the specified context.

See getUnterminatedHexadecimalCharacterReferenceMaxCodePoint(boolean insideAttributeValue) for the documentation of this property.

Parameters:
insideAttributeValue - the context within an HTML document - true if inside an attribute value or false if outside an attribute value.
maxCodePoint - the maximum unicode code point.

toString

public String toString()
Returns the name of this compatibility mode.
Returns:
the name of this compatibility mode.