au.id.jericho.lib.html

Class StartTagType

Known Direct Subclasses:
StartTagTypeGenericImplementation

public abstract class StartTagType
extends TagType

Defines the syntax for a start tag type.

A start tag type is any TagType that starts with the character '<' (as with all tag types), but whose second character is not '/'.

This includes types for many tags which stand alone, without a corresponding end tag, and would not intuitively be categorised as a "start tag". For example, an HTML comment in a document is represented as a single start tag that spans the whole comment, and does not have an end tag at all.

Instances of all the standard start tag types are available in this class as static fields.

See Also:
EndTagType

Field Summary

static StartTagType
CDATA_SECTION
The tag type given to a CDATA section (<![CDATA[ ... ]]>).
static StartTagType
COMMENT
The tag type given to an HTML comment (<!-- ... -->).
static StartTagType
DOCTYPE_DECLARATION
The tag type given to a document type declaration (<!DOCTYPE ... >).
static StartTagType
MARKUP_DECLARATION
The tag type given to a markup declaration (<! ... >).
static StartTagType
NORMAL
The tag type given to a normal HTML or XML start tag (<name ... >).
static StartTagType
SERVER_COMMON
The tag type given to a common server tag (<% ... %>).
static StartTagType
UNREGISTERED
The tag type given to an unregistered start tag (< ... >).
static StartTagType
XML_DECLARATION
The tag type given to an XML declaration (<?xml ... ?>).
static StartTagType
XML_PROCESSING_INSTRUCTION
The tag type given to an XML processing instruction (<?PITarget ... ?>).

Constructor Summary

StartTagType(String description, String startDelimiter, String closingDelimiter, EndTagType correspondingEndTagType, boolean isServerTag, boolean hasAttributes, boolean isNameAfterPrefixRequired)
Constructs a new StartTagType object with the specified properties.

Method Summary

boolean
atEndOfAttributes(Source source, int pos, boolean isClosingSlashIgnored)
Indicates whether the specified source document position is at the end of a tag's attributes.
protected StartTag
constructStartTag(Source source, int begin, int end, String name, Attributes attributes)
Internal method for the construction of a StartTag object if this type.
EndTagType
getCorrespondingEndTagType()
Returns the type of end tag required to pair with a start tag of this type to form an element.
boolean
hasAttributes()
Indicates whether a start tag of this type contains attributes.
boolean
isNameAfterPrefixRequired()
Indicates whether a valid XML tag name is required directly after the prefix.
protected Attributes
parseAttributes(Source source, int startTagBegin, String tagName)
Internal method for the parsing of Attributes.

Methods inherited from class au.id.jericho.lib.html.TagType

constructTagAt, deregister, getClosingDelimiter, getDescription, getNamePrefix, getRegisteredTagTypes, getStartDelimiter, getTagTypesIgnoringEnclosedMarkup, isServerTag, isValidPosition, register, setTagTypesIgnoringEnclosedMarkup, tagEncloses, toString

Field Details

CDATA_SECTION

public static final StartTagType CDATA_SECTION
The tag type given to a CDATA section (<![CDATA[ ... ]]>).

A CDATA section is a specific form of a marked section.

This library does not include a predefined generic tag type for marked sections, as the only form in which they are found in HTML documents are CDATA sections.

The HTML 4.01 specification section B.3.5 and the XML 1.0 specification section 2.7 contain definitions for a CDATA section.

There is inconsistency between the SGML and HTML/XML specifications in the definition of a marked section. SGML requires the presence of a space between the "<![" prefix and the keyword, and allows a space after the keyword. The XML specification forbids these spaces, and the examples given in the HTML specification do not include them either. This library only recognises CDATA sections that do not include the spaces.

The "![CDATA[" tag name is required to be in upper case in the source document according to the HTML/XML specifications, but all tag properties are stored in lower case because this makes it more efficient for the library to perform case-insensitive parsing of all tags.

In the default configuration, any non-server tag appearing within a CDATA section is ignored by the parser. See the documentation of the tag parsing process for more information.

PropertyValue
DescriptionCDATA section
StartDelimiter<![cdata[
ClosingDelimiter]]>
IsServerTagfalse
NamePrefix![cdata[
CorrespondingEndTagTypenull
HasAttributesfalse
IsNameAfterPrefixRequiredfalse
<script><![CDATA[ function min(a,b) {return a<b ? a : b;} ]]></script>

COMMENT

public static final StartTagType COMMENT
The tag type given to an HTML comment (<!-- ... -->).

An HTML comment is an area of the source document enclosed by the delimiters <!-- on the left and --> on the right.

The HTML 4.01 specification section 3.2.4 states that the end of comment delimiter may contain white space between the "--" and ">" characters, but this library does not recognise end of comment delimiters containing white space.

In the default configuration, any non-server tag appearing within an HTML comment is ignored by the parser. See the documentation of the tag parsing process for more information.

PropertyValue
Descriptioncomment
StartDelimiter<!--
ClosingDelimiter-->
IsServerTagfalse
NamePrefix!--
CorrespondingEndTagTypenull
HasAttributesfalse
IsNameAfterPrefixRequiredfalse
<!-- This is a comment -->

DOCTYPE_DECLARATION

public static final StartTagType DOCTYPE_DECLARATION
The tag type given to a document type declaration (<!DOCTYPE ... >).

Information about the document type declaration can be found in the HTML 4.01 specification section 7.2, and the XML 1.0 specification section 2.8.

The "!DOCTYPE" tag name is required to be in upper case in the source document, but all tag properties are stored in lower case because this library performs all parsing in lower case.

PropertyValue
Descriptiondocument type declaration
StartDelimiter<!doctype
ClosingDelimiter>
IsServerTagfalse
NamePrefix!doctype
CorrespondingEndTagTypenull
HasAttributesfalse
IsNameAfterPrefixRequiredfalse
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

MARKUP_DECLARATION

public static final StartTagType MARKUP_DECLARATION
The tag type given to a markup declaration (<! ... >).

The name of a markup declaration tag is must be one of "!element", "!attlist", "!entity" or "!notation". These tag names are required to be in upper case in the source document, but all tag properties are stored in lower case because this library performs all parsing in lower case.

Markup declarations usually appear inside a document type definition (DTD), which is usually an external document to the HTML or XML document, but they can also appear directly within the document type declaration which is why they must be recognised by the parser.

PropertyValue
Descriptionmarkup declaration
StartDelimiter<!
ClosingDelimiter>
IsServerTagfalse
NamePrefix!
CorrespondingEndTagTypenull
HasAttributesfalse
IsNameAfterPrefixRequiredtrue
<!ELEMENT BODY O O (%flow;)* +(INS|DEL) -- document body -->

NORMAL

public static final StartTagType NORMAL
The tag type given to a normal HTML or XML start tag (<name ... >).

PropertyValue
Descriptionnormal
StartDelimiter<
ClosingDelimiter>
IsServerTagfalse
NamePrefix(empty string)
CorrespondingEndTagTypeEndTagType.NORMAL
HasAttributestrue
IsNameAfterPrefixRequiredtrue
<div class="NormalDivTag">

SERVER_COMMON

public static final StartTagType SERVER_COMMON
The tag type given to a common server tag (<% ... %>).

Common server tags include ASP, JSP, PSP, ASP-style PHP, eRuby, and Mason substitution tags.

This is the only standard tag type that defines a server tag. It is included as a standard tag type because of its widespread use in many platforms, including those listed above.

PropertyValue
Descriptioncommon server tag
StartDelimiter<%
ClosingDelimiter%>
IsServerTagtrue
NamePrefix%
CorrespondingEndTagTypenull
HasAttributesfalse
IsNameAfterPrefixRequiredfalse
<%@ include file="header.html" %>

UNREGISTERED

public static final StartTagType UNREGISTERED
The tag type given to an unregistered start tag (< ... >).

See the documentation of the Tag.isUnregistered() method for details.

PropertyValue
Descriptionunregistered
StartDelimiter<
ClosingDelimiter>
IsServerTagfalse
NamePrefix(empty string)
CorrespondingEndTagTypenull
HasAttributesfalse
IsNameAfterPrefixRequiredfalse
<"This is not recognised as any of the predefined tag types in this library">

XML_DECLARATION

public static final StartTagType XML_DECLARATION
The tag type given to an XML declaration (<?xml ... ?>).

An XML declaration is often referred to in texts as a special type of processing instruction with the reserved PITarget name of "xml". Technically it is not an XML processing instruction at all, but is still a type of SGML processing instruction.

According to section 2.8 of the XML 1.0 specification, a valid XML declaration can specify only "version", "encoding" and "standalone" attributes in that order. This library parses the attributes of an XML declaration in the same way as those of a normal tag, without checking that they conform to the specification.

PropertyValue
DescriptionXML declaration
StartDelimiter<?xml
ClosingDelimiter?>
IsServerTagfalse
NamePrefix?xml
CorrespondingEndTagTypenull
HasAttributestrue
IsNameAfterPrefixRequiredfalse
<?xml version="1.0" encoding="UTF-8"?>

XML_PROCESSING_INSTRUCTION

public static final StartTagType XML_PROCESSING_INSTRUCTION
The tag type given to an XML processing instruction (<?PITarget ... ?>).

An XML processing instruction is a specific form of SGML processing instruction with the following two additional constraints:

This library does not include a predefined generic tag type for SGML processing instructions as the only forms in which they are found in HTML documents are the more specific XML processing instruction and the XML declaration, both of which have their own dedicated predefined tag type.

There is no restriction on the contents of an XML processing instruction. In particular, it can not be assumed that the processing instruction contains attributes, in contrast to the XML declaration.

Note that registering the PHPTagTypes.PHP_SHORT tag type overrides this tag type. This is because they both have the same start delimiter, so the one registered latest takes precedence over the other. See the documentation of the PHPTagTypes class for more information.

PropertyValue
DescriptionXML processing instruction
StartDelimiter<?
ClosingDelimiter?>
IsServerTagfalse
NamePrefix?
CorrespondingEndTagTypenull
HasAttributesfalse
IsNameAfterPrefixRequiredtrue
<?xml-stylesheet href="standardstyle.css" type="text/css"?>

Constructor Details

StartTagType

protected StartTagType(String description,
                       String startDelimiter,
                       String closingDelimiter,
                       EndTagType correspondingEndTagType,
                       boolean isServerTag,
                       boolean hasAttributes,
                       boolean isNameAfterPrefixRequired)
Constructs a new StartTagType object with the specified properties.
(implementation assistance method)

As StartTagType is an abstract class, this constructor is only called from sub-class constructors.

Parameters:
description - a description of the new start tag type useful for debugging purposes.
startDelimiter - the start delimiter of the new start tag type.
closingDelimiter - the closing delimiter of the new start tag type.
correspondingEndTagType - the corresponding end tag type of the new start tag type.
isServerTag - indicates whether the new start tag type is a server tag.
hasAttributes - indicates whether the new start tag type has attributes.
isNameAfterPrefixRequired - indicates whether a name is required after the prefix.

Method Details

atEndOfAttributes

public boolean atEndOfAttributes(Source source,
                                 int pos,
                                 boolean isClosingSlashIgnored)
Indicates whether the specified source document position is at the end of a tag's attributes.
(default implementation method)

This method is called internally while parsing attributes to detect where they should end.

It can be assumed that the specified position is not inside a quoted attribute value.

The default implementation simply compares the parse text at the specified position with the closing delimiter, and is equivalent to:
source.getParseText().containsAt(getClosingDelimiter(),pos)

The isClosingSlashIgnored parameter is only relevant in the NORMAL start tag type, which makes use of it to cater for the '/' character that can occur before the closing delimiter in empty-element tags. It's value is always false when passed to other start tag types.

Parameters:
source - the Source document.
pos - the character position in the source document.
isClosingSlashIgnored - indicates whether the name of the start tag being tested is incompatible with an empty-element tag.
Returns:
true if the specified source document position is at the end of a tag's attributes, otherwise false.

constructStartTag

protected final StartTag constructStartTag(Source source,
                                           int begin,
                                           int end,
                                           String name,
                                           Attributes attributes)
Internal method for the construction of a StartTag object if this type.
(implementation assistance method)

Intended for use from within the constructTagAt(Source, int pos) method.

Parameters:
source - the Source document.
begin - the character position in the source document where the tag begins.
end - the character position in the source document where the tag ends.
name - the name of the tag.
attributes - the attributes of the tag.
Returns:
the new StartTag object.

getCorrespondingEndTagType

public final EndTagType getCorrespondingEndTagType()
Returns:
the type of end tag required to pair with a start tag of this type to form an Element.

hasAttributes

public final boolean hasAttributes()
Indicates whether a start tag of this type contains attributes.
(property method)

The attributes start at the end of the name and continue until the closing delimiter is encountered. If the character sequence representing the closing delimiter occurs within a quoted attribute value it is not recognised as the end of the tag.

The atEndOfAttributes(Source, int pos, boolean isClosingSlashIgnored) method can be overridden to provide more control over where the attributes end.

Start Tag TypeHas Attributes
UNREGISTEREDfalse
NORMALtrue
COMMENTfalse
XML_DECLARATIONtrue
XML_PROCESSING_INSTRUCTIONfalse
DOCTYPE_DECLARATIONfalse
MARKUP_DECLARATIONfalse
CDATA_SECTIONfalse
SERVER_COMMONfalse
Start Tag TypeHas Attributes
PHPTagTypes.PHP_SCRIPTtrue
PHPTagTypes.PHP_SHORTfalse
PHPTagTypes.PHP_STANDARDfalse
MasonTagTypes.MASON_COMPONENT_CALLfalse
MasonTagTypes.MASON_COMPONENT_CALLED_WITH_CONTENTfalse
MasonTagTypes.MASON_NAMED_BLOCKfalse
Returns:
true if a start tag of this type contains attributes, otherwise false.

isNameAfterPrefixRequired

public final boolean isNameAfterPrefixRequired()
Indicates whether a valid XML tag name is required directly after the prefix.
(property method)

If this property is true, the name of the tag consists of the prefix followed by an XML tag name.

If this property is false, the name of the tag consists of only the prefix.

Start Tag TypeName After Prefix Required
UNREGISTEREDfalse
NORMALtrue
COMMENTfalse
XML_DECLARATIONfalse
XML_PROCESSING_INSTRUCTIONtrue
DOCTYPE_DECLARATIONfalse
MARKUP_DECLARATIONtrue
CDATA_SECTIONfalse
SERVER_COMMONfalse
Start Tag TypeName After Prefix Required
PHPTagTypes.PHP_SCRIPTfalse
PHPTagTypes.PHP_SHORTfalse
PHPTagTypes.PHP_STANDARDfalse
MasonTagTypes.MASON_COMPONENT_CALLfalse
MasonTagTypes.MASON_COMPONENT_CALLED_WITH_CONTENTfalse
MasonTagTypes.MASON_NAMED_BLOCKtrue
Returns:
true if a valid XML tag name is required directly after the prefix, otherwise false.

parseAttributes

protected final Attributes parseAttributes(Source source,
                                           int startTagBegin,
                                           String tagName)
Internal method for the parsing of Attributes.
(implementation assistance method)

Intended for use from within the constructTagAt(Source, int pos) method.

The returned Attributes segment begins at startTagBegin+1+tagName.length(), and ends straight after the last attribute found before the tag's closing delimiter.

Only returns null if the segment contains a major syntactical error or more than the default maximum number of minor syntactical errors.

Parameters:
source - the Source document.
startTagBegin - the position in the source document at which the start tag is to begin.
tagName - the name of the start tag to be constructed.
Returns:
the Attributes of the start tag to be constructed, or null if too many errors occur while parsing.