Package au.id.jericho.lib.html

A simple but powerful java library allowing analysis and manipulation of parts of an HTML document, including some common server-side tags, while reproducing verbatim any unrecognised or invalid HTML.

Interface Summary

CharStreamSource Represents a character stream source.
HTMLElementName Contains static fields representing the names of all elements defined in the HTML 4.01 specification.
OutputSegment Defines the interface for an output segment, which is used in an OutputDocument to replace segments of the source document with other text.

Class Summary

Attribute Represents a single attribute name/value segment within a StartTag.
Attributes Represents the list of Attribute objects present within a particular StartTag.
AttributesOutputSegment Implements an OutputSegment whose content is a list of attribute name/value pairs.
CharacterEntityReference Represents an HTML Character Entity Reference.
CharacterReference Represents an HTML Character Reference, implemented by the subclasses CharacterEntityReference and NumericCharacterReference.
CharStreamSourceUtil Contains static utility methods for manipulating the way data is retrieved from a CharStreamSource object.
Config Encapsulates global configuration properties which determine the behaviour of various functions.
Config.CompatibilityMode Represents a set of configuration parameters that relate to user agent compatibility issues.
Element Represents an element in a specific source document, which encompasses a start tag, an optional end tag and all content in between.
EndTag Represents the end tag of an element in a specific source document.
EndTagType Defines the syntax for an end tag type.
EndTagTypeGenericImplementation Provides a generic implementation of the abstract EndTagType class based on the most common end tag behaviour.
FormControl Represents an HTML form control.
FormControlOutputStyle An enumerated type representing the three major output styles of a form control's output element.
FormControlOutputStyle.ConfigDisplayValue Contains static properties that configure the FormControlOutputStyle.ConfigDisplayValue form control output style.
FormControlType Represents the control type of a FormControl.
FormField Represents a field in an HTML form, a field being defined as the group of all form controls having the same name.
FormFields Represents a collection of FormField objects.
HTMLElements Contains static methods which group HTML element names by the characteristics of their associated elements.
MasonTagTypes Contains tag types related to the Mason server platform.
NumericCharacterReference Represents an HTML Numeric Character Reference.
OutputDocument Represents a modified version of an original Source document.
OverlappingOutputSegmentsException Signals that overlapping output segments have been detected in the OutputDocument.
ParseText Represents the text from the source document that is to be parsed.
PHPTagTypes Contains tag types related to the PHP server platform.
RowColumnVector Represents the row and column number of a character position in the source document.
Segment Represents a segment of a Source document.
Source Represents a source HTML document.
StartTag Represents the start tag of an element in a specific source document.
StartTagType Defines the syntax for a start tag type.
StartTagTypeGenericImplementation Provides a generic implementation of the abstract StartTagType class based on the most common start tag behaviour.
StringOutputSegment Implements an OutputSegment whose content is a CharSequence.
Tag Represents either a StartTag or EndTag in a specific source document.
TagType Defines the syntax for a tag type that can be recognised by the parser.
Util Contains miscellaneous utility methods not directly associated with the HTML Parser library.
A simple but powerful java library allowing analysis and manipulation of parts of an HTML document, including some common server-side tags, while reproducing verbatim any unrecognised or invalid HTML. Also provides high-level HTML form manipulation functions.

For an introduction to the API, the documentation of the Source class is the best place to start.

For a summary of features and sample applications, visit the homepage at http://jerichohtml.sourceforge.net

For downloads, support and updates visit the SourceForge.net project page at http://sourceforge.net/projects/jerichohtml/

The Jericho HTML Parser is an open source library released under the GNU Lesser General Public License (LGPL). You are therefore free to use it in commercial applications subject to the terms detailed in the licence document. <!-- Put @see and @since tags down here. -->