Defines the syntax for a tag type that can be recognised by the parser.
This class is the root abstract class common to all tag types, and contains methods to
register
and
deregister tag types as well as various methods to aid in their implementation.
Every tag type is represented by an instance of a class (usually a singleton) that must be a subclass of either
StartTagType
or
EndTagType
. These two abstract classes, the only direct descendants of this class,
represent the two major classifications under which every tag type exists.
The term
predefined tag type refers to any of the tag types defined in this library,
including both
standard and
extended tag types.
The term
standard tag type refers to any of the tag types represented by instances
in static fields of the
StartTagType
and
EndTagType
subclasses.
Standard tag types are registered by default, and define the tags most commonly found in HTML documents.
The term
extended tag type refers to any
predefined tag type
that is not a
standard tag type.
The
PHPTagTypes
and
MasonTagTypes
classes contain extended tag types related to their respective server platforms.
The tag types defined within them must be registered by the user before they are recognised by the parser.
The term
custom tag type refers to any user-defined tag type, or any tag type that is
not a
predefined tag type.
The tag recognition process of the parser gives each tag type a
precedence level,
which is primarily determined by the length of its
start delimiter.
A tag type with a more specific start delimiter is chosen in preference to one with a less specific start delimiter,
assuming they both share the same prefix. If two tag types have exactly the same start delimiter, the one which was
registered later has the higher precedence.
The two special tag types
StartTagType.UNREGISTERED
and
EndTagType.UNREGISTERED
represent
tags that do not match the syntax of any other tag type. They have the lowest
precedence
of all the tag types. The
Tag.isUnregistered()
method provides a detailed explanation of unregistered tags.
See the documentation of the
tag parsing process for more information
on how each tag is identified by the parser.
Note that the standard
HTML element names do not represent different
tag
types. All standard HTML tags have a tag type of
StartTagType.NORMAL
or
EndTagType.NORMAL
.
Apart from the
registration related methods, all of the methods in this class and its
subclasses relate to the implementation of
custom tag types and are not relevant to the majority of users
who just use the
predefined tag types.
For perfomance reasons, this library only allows tag types that
start
with a '
<
' character.
The character following this defines the immediate subclass of the tag type.
An
EndTagType
always has a slash ('
/
') as the second character, while a
StartTagType
has any character other than a slash as the second character.
This definition means that tag types which are not intuitively classified as either start tag types or end tag types
(such as an HTML
comment) are mostly classified as start tag types.
Every method in this and the
StartTagType
and
EndTagType
abstract classes can be categorised
as one of the following:
- Properties:
- Abstract implementation methods:
- Default implementation methods:
- Implementation assistance methods:
- Registration related methods:
- registration
constructTagAt
protected abstract Tag constructTagAt(Source source,
int pos)
Constructs a tag of this type at the specified position in the specified source document if it matches all of the required features.
(
abstract implementation method)
The implementation of this method must check that the text at the specified position meets all of
the criteria of this tag type, including such checks as the presence of the correct or well formed
closing delimiter,
name,
attributes,
end tag, or any other distinguishing features.
It can be assumed that the specified position starts with the
start delimiter of this tag type,
and that all other tag types with higher
precedence (if any) have already been rejected as candidates.
Tag types with lower precedence will be considered if this method returns
null
.
This method is only called after a successful check of the tag's position, i.e.
isValidPosition(source,pos)
==true
.
The
StartTagTypeGenericImplementation
and
EndTagTypeGenericImplementation
subclasses provide default
implementations of this method that allow the use of much simpler
properties and
implementation assistance methods and to carry out the required functions.
source
- the Source
document.pos
- the position in the source document.
- a tag of this type at the specified position in the specified source document if it meets all of the required features, or
null
if it does not meet the criteria.
deregister
public final void deregister()
getClosingDelimiter
public final String getClosingDelimiter()
Returns the character sequence that marks the end of the tag.
(
property method)
The character sequence must be all in lower case.
In a
StartTag
of a
type that
has attributes,
characters appearing inside a quoted attribute value are ignored when determining the location of the closing delimiter.
Note that the optional '
/
' character preceding the closing '
>
' in an
empty-element tag is not considered part of the end delimiter.
This property must define the closing delimiter common to all instances of the tag type.
- the character sequence that marks the end of the tag.
getDescription
public final String getDescription()
Returns a description of this tag type useful for debugging purposes.
(
property method)
- a description of this tag type useful for debugging purposes.
getNamePrefix
protected final String getNamePrefix()
- the name prefix required by this tag type.
getRegisteredTagTypes
public static final List getRegisteredTagTypes()
- a list of all the currently registered tag types in order of lowest to highest precedence.
getStartDelimiter
public final String getStartDelimiter()
Returns the character sequence that marks the start of the tag.
(
property method)
The character sequence must be all in lower case.
The first character in this property
must be '
<
'.
This is a deliberate limitation of the system which is necessary to retain reasonable performance.
The second character in this property must be '
/
' if the implementing class is an
EndTagType
.
It must
not be '
/
' if the implementing class is a
StartTagType
.
- the character sequence that marks the start of the tag.
getTagTypesIgnoringEnclosedMarkup
public static final TagType[] getTagTypesIgnoringEnclosedMarkup()
Returns an array of all the tag types inside which the parser ignores all other non-
server tags
in
parse on demand mode.
(
implementation assistance method)
The tag types returned by this property (referred to in the following paragraphs as the "listed types") default to
StartTagType.COMMENT
and
StartTagType.CDATA_SECTION
.
In
parse on demand mode,
every new non-server tag found by the parser (referred to as a "new tag") undergoes a check to see whether it is enclosed
by a tag of one of the listed types, including new tags of the listed types themselves.
The recursive nature of this check means that
all tags of the listed types occurring before the new tag must be found
by the parser before it can determine whether the new tag should be ignored.
To mitigate any performance issues arising from this process, the listed types are given special treatment in the tag cache.
This dramatically decreases the time taken to search on these tag types, so adding a tag type to this array that
is easily recognised and occurs infrequently only results in a small degradation in overall performance.
Theoretically, non-server tags appearing inside
any other non-server tag should be ignored.
One situation where a tag can legitimately contain a sequence of characters that resembles a tag,
which shouldn't be recognised as a tag by the parser, is within an attribute value.
The
HTML 4.01 specification section 5.3.2
specifically allows the presence of '
<
' and '
>
' characters within attribute values.
A common occurrence of this is in
event
attributes such as
onclick
,
which contain scripts that often dynamically load new HTML into the document
(see the file
samples/data/Test.html
for an example).
Performing a
full sequential parse of the source document prevents these attribute values from being
recognised as tags, but can be very expensive if only a few tags in the document need to be parsed.
The penalty of not parsing every tag in the document is that the exactness of this check is compromised, but in practical terms the difference is inconsequential.
The default listed types of
comments and
CDATA sections yields sensible results
in the vast majority of practical applications with only a minor impact on performance.
In
XHTML, '
<
' and '
>
' characters
must be represented in attribute values as
character references
(see the XML 1.0 specification section
3.1),
so the situation should never arise that a tag is found inside another tag unless one of them is a
server tag.
This method is called from the default implementation of the
isValidPosition(Source, int pos)
method.
- an array of all the tag types inside which the parser ignores all other non-server tags.
isServerTag
public final boolean isServerTag()
Indicates whether this tag type represents a server tag.
(
property method)
Server tags are typically parsed by some process on the web server and substituted with other text or markup before delivery to the
user agent.
This parser therefore handles them differently to non-server tags in that they can occur at any position in the document
without regard for the HTML document structure. As a result they can occur anywhere inside any other tag and vice versa.
To avoid the problem of server tags interfering with the proper parsing of the rest of the document, the
Segment.ignoreWhenParsing()
method can be called on all server tags found in the document before parsing the non-server tags.
The documentation of the
tag parsing process explains in detail
how the value of this property affects the recognition of a tag.
true
if this tag type represents a server tag, otherwise false
.
isValidPosition
protected boolean isValidPosition(Source source,
int pos)
Indicates whether a tag of this type is valid in the specified position of the specified source document.
(
implementation assistance method)
This method is called immediately before
constructTagAt(Source, int pos)
to do a preliminary check on the validity of a tag of this type in the specified position.
This check is not performed as part of the
constructTagAt(Source, int pos)
call because the same
validation is used for all the
standard tag types, and is likely to be sufficient
for all
custom tag types.
Having this check separated into a different method helps to isolate common code from the code that is unique to each tag type.
In theory, a
server tag is valid in any position, but a non-server tag is not valid inside another non-server tag.
The common implementation of this method always returns
true
for server tags, but for non-server tags it behaves slightly differently
depending upon whether or not a
full sequential parse is being peformed.
If so, it implements the exact theoretical check and rejects a non-server tag if it is inside any other non-server tag.
If a full sequential parse was not performed (i.e. in
parse on demand mode),
practical constraints do not permit the implementation of the exact theoretical check, and non-server tags are only rejected
if they are found inside HTML
comments or
CDATA sections.
This behaviour is configurable by manipulating the static
TagTypesIgnoringEnclosedMarkup
array
to determine which tag types can not contain non-server tags.
The
documentation of this property contains
a more detailed analysis of the subject and explains why only the
comment and
CDATA section tag types are included by default.
See the documentation of the
tag parsing process for more information about how this method fits into the whole tag parsing process.
This method can be overridden in
custom tag types if the default implementation is unsuitable.
source
- the Source
document.pos
- the character position in the source document to check.
true
if a tag of this type is valid in the specified position of the specified source document, otherwise false
.
register
public final void register()
Registers this tag type for recognition by the parser.
(
registration related method)
The order of registration affects the
precedence of the tag type when a potential tag is being parsed.
setTagTypesIgnoringEnclosedMarkup
public static final void setTagTypesIgnoringEnclosedMarkup(TagType[] tagTypes)
tagTypes
- an array of tag types.
tagEncloses
protected final boolean tagEncloses(Source source,
int pos)
Indicates whether a tag of this type encloses the specified position of the specified source document.
(
implementation assistance method)
This is logically equivalent to
source.
findEnclosingTag(pos,this)
!=null
,
but is safe to use within other implementation methods without the risk of causing an infinite recursion.
This method is called by the
TagType
implementation of
isValidPosition(Source, int pos)
.
source
- the Source
document.pos
- the character position in the source document to check.
true
if a tag of this type encloses the specified position of the specified source document, otherwise false
.
toString
public String toString()
Returns a string representation of this object useful for debugging purposes.
- a string representation of this object useful for debugging purposes.