used to point to Web Accessibility Guidelines.
Adds an attribute to the node.
Adds a byte to lexer buffer.
Store char c as UTF-8 encoded byte stream.
Add a css class to the node.
Setter method for any property using the ant type Parameter.
Adds a fileset to be processed Fileset
Add meta element for Tidy.
adds configuration Properties.
calls addCharToLexer for any char in the string.
Adds a string to lexer buffer.
Add meta element for page transition effect, this works on IE but not NS.
Ensure that config is self consistent.
checker for "align" attribute.
default text for alt attribute.
attribute: anchor not unique.
invalid entity: apos undefined in current definition.
Return the html version used in document.
character encoding = ASCII.
convert quotes and dashes to nearest ASCII char.
attribute: attribute value not lower case.
Check attribute values implementations.
Prints error messages for attributes.
Instantiates a new Attribute.
Attribute/Value linked list.
HTML attribute hash table.
Attribute/Value linked list node.
Instantiates a new empty AttVal.
Instantiates a new AttVal.
Instantiates a new AttVal.
wrapped org.w3c.tidy.AttVal.
attribute: backslash in URI.
attribute: bad attribute value.
attribute: bad attribute value replaced.
for accessibility errors.
Constant used for reporting of bad access summary.
Prints a "bad argument" error message.
set if html or PUBLIC is missing.
for mismatched/mispositioned form tags.
Constant used for reporting of bad form summary.
Prints the "bad tree" message.
character encoding = BIG5.
parser for block elements.
output BODY content only.
checker for boolean attributes.
Replace implicit blockquote by div with an indent taking care to reduce nested blockquotes to a single div with
the indent set to match the nesting depth.
o/p newline before br or not?
create slides on each h2 element.
Can the given element be removed?
Substitute the last char in buffer.
checker for attributes that can contain a single character.
checker for "charset" attribute.
Checks attributes in given Node.
Check the value of an attribute.
AttrCheck implementation for checking the "align" attribute.
Checker implementation for anchors.
Checker implementation for area.
Check attribute name/value and report errors.
Default method for checking an element's attributes.
AttrCheck implementation for checking boolean attributes.
Checker implementation for table caption.
AttrCheck implementation for checking the "clear" attribute.
AttrCheck implementation for checking colors.
Check system keywords (keywords should be uppercase).
Checker implementation for forms.
AttrCheck implementation for checking the "submit" attribute.
Checker implementation for hr.
Checker implementation for html tag.
AttrCheck implementation for checking ids.
Checker implementation for image tags.
AttrCheck implementation for checking lang and xml:lang.
AttrCheck implementation for checking the "length" attribute.
add missing type attribute when appropriate.
Checker implementation for image maps.
Checker implementation for meta tags.
AttrCheck implementation for checking the "name" attribute.
Checks for node integrity.
AttrCheck implementation for checking numbers.
AttrCheck implementation for checking Scope.
AttrCheck implementation for checking scripts.
Checker implementation for script tags.
AttrCheck implementation for checking scroll.
AttrCheck implementation for checking the "shape" attribute.
Checker implementation for style tags.
Checker implementation for table.
Checker implementation for table cells.
AttrCheck implementation for checking the "target" attribute.
AttrCheck implementation for checking dir.
AttrCheck implementation for checking URLs.
AttrCheck implementation for checking the "valign" attribute.
AttrCheck implementation for checking valuetype.
Clean up misuse of presentation markup.
Instantiates a new Clean.
This is a major clean up to strip out all the extra stuff you get when you save as web page from Word 2000.
checker for "clear" attribute.
Used to clone heading nodes when split by an hr.
Clones an attribute value and add eventual asp or php node to node list.
Clones a node and add it to node list.
true if closed by explicit end tag.
Content model: definition list.
Content model: no indent.
checker for "color" attribute.
checker for "cols" attribute.
at start of current token.
Read configuration file and manage configuration properties.
Instantiates a new Configuration.
Convert a char encoding from the deprecated tidy constant to a standard java encoding name.
checker for "coords" attribute.
Split parse tree by h2 elements and output to separate files.
Creates an empty DOM Document.
CSS class naming for -clean option.
checker for attributes containing dates.
Declare a new literal attribute.
Function to convert from MacRoman to Unicode.
Function for conversion from Windows-1252 to Unicode.
Defer duplicates when entering a table or other element where the inlines shouldn't be duplicated.
track what types of tags user has defined to eliminate unnecessary searches.
parser for definition lists.
Instantiates a new Tag definition.
Discard the doctype node.
Remove node from markup tree and discard it.
discarding unexpected element.
version as given by doctype (if any).
treatment of doctype: auto.
Constant used for reporting of given doctype.
treatment of doctype: loose.
treatment of doctype: omit.
treatment of doctype: strict.
treatment of doctype: user.
Tidy implementation of org.w3c.dom.DOMAttrImpl.
instantiates a new DOMAttrImpl which wraps the given AttVal.
Tidy implementation of org.w3c.dom.NamedNodeMap.
instantiates a new DOMAttrMapImpl for the given AttVal.
Tidy implementation of org.w3c.dom.CDATASection.
Instantiates a new DOMCDATASectionImpl which wraps the given Node.
Tidy implementation of org.w3c.dom.CharacterData.
Instantiates a new DOMCharacterDataImpl which wraps the given Node.
Tidy implementation of org.w3c.dom.Comment.
Instantiates a new DOMCommentImpl which wraps the given Node.
Instantiates a new Dom document with a default tag table.
Instantiates a new DOM document type.
Instantiates a new DOM element.
Intantiates a new DOM node.
DOMNodeListByTagNameImpl.
Instantiates a new DOMNodeListByTagName.
Instantiates a new DOM node list.
DOMProcessingInstructionImpl.
Instantiates a new DOM processing instruction.
Instantiates a new DOM text node.
discard empty p elements.
discard presentation tags.
discard proprietary attributes.
Drop if/endif sections inserted by word2000.
Keep first or last duplicate attribute.
name (null for text nodes).
if true format error output for GNU Emacs.
Replace i by em and b by strong.
parser for empty elements.
if yes text in blocks is wrapped in p's.
if yes text at body is wrapped in p's.
character encoding: encoding mismatch.
Prints encoding error messages.
Maps between Java and IANA character encoding names.
end - field in class org.w3c.tidy.
Node end of span onto text array.
Has end of input stream been reached?
instantiates a new entity.
Returns the entity code for the given entity name.
Prints entity error messages.
Returns the entity name for the given entity code.
file name to write errors to.
replace CDATA sections with escaped text.
attribute: escaped illegal URI.
A single file has been specified.
true if moved out of table.
Does the node expect contents?
checker for "frameborder" attribute.
public method for finding attribute definition by name.
Return true if substring s is in p and isn't all in upper case.
Find the doctype element.
Examine DOCTYPE to identify version.
Finds a parser fo the given node.
fix URLs by replacing \ with /.
fix comments with adjacent hyphens.
Fixup doctype if missing.
attribute: fixed backslash.
If a table row is empty then insert an empty cell.This practice is consistent with browser behavior and avoids
potential problems with row spanning cells.
duplicate name attribute as an id and check if id and name match.
Ensure XML document starts with <?XML version="1.0"?>
.
output document even if errors were found.
character encoding error: found utf16.
Returns the Level instance corresponding to the given int value.
states for ISO 2022 A document in ISO-2022 based encoding uses some ESC sequences called "designator" to switch
character sets.
checker for "submit" attribute.
Prints tidy general info.
Return the org.w3c.dom.Attr adapter.
Returns a DOM Node which wrap the current tidy Node.
alt-text
- default text for alt attribute.
ascii-chars
- convert quotes and dashes to nearest ASCII char.
Returns an attribute with the given name in the current node.
Returns the checker for this attribute.
break-before-br - output newline before <br>.
split
- create slides on each h2 element.
Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding
throws declarations in lots of methods.
Create a text node for the contents of a CDATA element like style or script which ends with </foo> for some
foo.
Returns the int value for this level.
Returns the actual configuration
Returns the default attribute table instance.
Returns the default entity table instance.
doctype
- user specified doctype.
drop-empty-paras
- discard empty p elements.
drop-font-tags
- discard presentation tags.
drop-proprietary-attributes
- discard proprietary attributes.
gnu-emacs
- if true format error output for GNU Emacs.
enclose-block-text
- if true text in blocks is wrapped in <p>'s.
enclose-text
- if true text at body is wrapped in <p>'s.
Errfile - file name to write errors to.
Errout - the error output stream.
escape-cdata
-replace CDATA sections with escaped text.
fix-backslash
- fix URLs by replacing \ with /.
fix-bad-comments
- fix comments with adjacent hyphens.
fix-uri
- output BODY content only.
force-output
- output document even if errors were found.
Returns the "friendly name" for the passed value.
hide-comments
- hides all (real) comments in output.
hide-endtags - suppress optional end tags.
Getter for inCharEncodingName
.
indent-attributes
- newline+indent before each attribute.
indent-cdata
- indent CDATA sections.
indent - indent content of appropriate tags.
input-encoding
the character encoding used for input.
join-classes
- join multiple class attributes.
join-styles
- join multiple style attributes.
keep-time
- if true last modified time is preserved.
literal-attributes
- if true attributes may use newlines.
logical-emphasis
- replace i by em and b by strong.
lower-literals
- folds known attribute values to lower case.
make-clean - remove Microsoft cruft.
make-clean - remove presentational clutter.
Generates a complete message for the warning/error.
Returns the attribute name.
Not supported, returns DOMException.NOT_SUPPORTED_ERR
.
numeric-entities
- output entities other than the built-in HTML entities in the numeric rather
than the named entity form.
only-errors - if true normal output is suppressed.
Returns the valid values.
Returns the appropriate Out implementation.
Returns the appropriate Out implementation.
Getter for outCharEncodingName
.
output-encoding
the character encoding used for output.
ParseErrors - the number of errors that occurred in the most recent parse operation.
ParseWarnings - the number of warnings that occurred in the most recent parse operation.
print-body-only
- output BODY content only.
quiet - no 'Parsing X', guessed DTD or summary.
quote-ampersand
- output naked ampersand as &.
quote-marks
- output " marks as ".
quote-nbsp
- output non-breaking space as entity.
output-raw
- avoid mapping values > 127 to entities.
repeated-attributes
- keep first or last duplicate attribute.
replace-color
- replace hex color attribute values with names.
show-errors
- number of errors to put out.
show-warnings - show warnings? (errors are always shown).
SmartIndent - does text/block level content effect indentation.
indent-spaces
- default indentation.
Returns the appropriate StreamIn implementation.
Returns the appropriate StreamIn implementation.
Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding
throws declarations in lots of methods.
tab-size
- tab size in chars.
tidy-mark
- add meta element indicating tidied doc.
trim-empty-elements
- trim empty elements.
uppercase-attributes - output attributes in upper case.
uppercase-tags - output tags in upper case.
return one less than the number of bytes used by the UTF-8 byte sequence.
Returns the html versions in which this attribute is supported.
word-2000
- draconian cleaning for Word2000.
wrap-asp
- wrap within ASP pseudo elements.
wrap-attributes
- wrap within attribute values.
wrap-jste
- wrap within JSTE pseudo elements.
wrap
- default wrap margin.
wrap-php
- wrap within PHP pseudo elements.
wrap-script-literals
- wrap within JavaScript string literals.
wrap-sections
- wrap within <![ ...
writeback - if true then output tidied markup.
output-xhtml - output extensible HTML.
output-xml - create output as XML.
add-xml-pi
- add <?xml?> for XML docs.
assume-xml-procins
This option specifies if Tidy should change the parsing of processing
instructions to require ?> as the terminator rather than >.
add-xml-space
- if set to yes adds xml:space attr as needed.
input-xml - treat input as XML.
attribute: id and name mismatch.
checker for attributes referencng an id.
state: ignore whitespace.
attribute: illegal URI reference.
in - field in class org.w3c.tidy.
Lexer file stream.
newline+indent before each attribute.
indent content of appropriate tags.
Generates and inserts a new node.
This has the effect of inserting "missing" inline elements around the contents of blocklevel elements such as P,
TD, TH, DIV, PRE etc.
Inline stack for compatibility with Mosaic.
for inferring inline tags.
The doctype has been found after other tags, and needs moving to before the html element.
Insert a node at the end.
Insert node into markup tree after element.
Insert node into markup tree in pace of element which is moved to become the child of the node.
Insert node into markup tree.
Insert a node into markup tree.
Insert node into markup tree before element.
when space is moved after end tag.
installs a new Attribute.
Installs a new tag in the tag table, or modify an existing one.
attribute: invalid attribute.
character encoding: invalid NCR.
character encoding: invalid sgml chars.
character encoding: nvalid URI.
character encoding: invalid utf16.
character encoding: invalid utf8.
attribute: invalid xml id.
Is the node content empty or blank? Assumes node is a text node.
Is this a boolean attribute.
Is the given character encoding supported?
In CSS1, selectors can contain only the characters A-Z, 0-9, and Unicode characters 161-255, plus dash (-); they
cannot start with a dash or a digit; they can also contain escaped characters and any Unicode character as a
numeric code (see next item).
Is this node contained in a given tag?
Is the given char a digit?
Has end of stream been reached?
Used to check script node for script language.
Is the given String a valid configuration flag?
Is the given char a letter?
Is this a literal (unmodifiable) attribute?
Does the given attibute contains a literal attribute?
Determines if the specified character is a lowercase character.
Is the given char valid in name? (letter, digit or "-", ".", ":", "_")
Is this a new (user defined) node? Used to determine how attributes without values should be printed.
Don't wrap this attribute?
character encoding = ISO2022.
Is the node in the stack?
Does the given attibute contains a script?
Determines if the specified character is a uppercase character.
Does the given attibute contains an url?
Check if attr is a valid name.
true if xmlns attribute on html element.
Determines if the specified character is whitespace.
Check if the current document is a converted Word document.
character encoding = MACROMAN.
Command line interface to parser and pretty printer.
Main method, but returns the return code as an int instead of calling System.exit(code).
Make bare HTML: remove Microsoft cruft.
remove presentational clutter.
Max UTF-88 valid char value.
checker for "media" attribute.
Called by tidy when a warning or error occurs.
attribute: missing attribute value.
attribute: missing attribute.
accessibility flaw: missing image map.
accessibility flaw: missing image map.
attribute: missing image map.
accessibility flaw: missing link alt.
attribute: missing quotemark.
invalid entity: missing semicolon.
invalid entity: missing semicolon.
accessibility flaw: missing summary.
Prints the "missing body" message.
Unexpected content in table row is moved to just before the table in accordance with Netscape and IE.
Move node to the head, where element is used as starting point in hunt for head.
checker for "name" attribute.
allow numeric character references.
Prints the "needs author intervention" message.
bytes for the newline marker.
attribute: newline in URI.
Creates a new node and add it to nodelist.
Creates a new node and add it to nodelist.
Creates a new node and add it to nodelist.
Next element in the stack.
Next linked style element.
Next linked style property.
Used for elements and text nodes element name is null for text nodes start and end are offsets into lexbuf which
contains the textual content of all elements in the parse tree.
Instantiates a new text node.
character encoding error: non ascii.
Do nothing: text nodes in html documents are important and jtidy already removes useless text during parsing.
checker for "number" attribute.
Reads from the given input and returns the root Node.
Reads from the given input and returns the root Node.
Reads from the given input and returns the root Node.
Reads from the given input and returns the root Node.
Parse a configuration option.
parser for ASP within start tags Some people use ASP for to customize attributes Tidy isn't really well suited to
dealing with ASP This is a workaround for attributes, but won't deal with the case where the ASP is used to
tailor the attribute value.
consumes the '>' terminating start tags.
Parser for block elements.
HTML is the top level element.
Parses InputStream in and returns a DOM Document node.
Parser for empty elements.
PHP is like ASP but is based upon XML processing instructions, e.g.
Interface for configuration property parser.
Property parser instances.
HTML Parser implementation.
Invoked when < is seen in place of attribute value but terminates on whitespace if not ASP, PHP or Tango this
routine recognizes ' and " quoted strings.
Parse an attribute value.
Pop a copy of an inline node from the stack.
Pretty-prints a DOM Document.
Pretty-prints a DOM Node.
Instantiates a new PPrint.
Is content acceptable for pre elements?
Called from printTree to print the content of a slide from the node slidecontent.
attribute: proprietary attribute value.
attribute: proprietary attribute.
node is <![if ...]>
prune up to <![endif]>
.
Remove word2000 attributes from node.
true after token has been pushed back.
Push a copy of an inline node onto stack but don't push if implicit or OBJECT or APPLET (implicit tags are ones
generated from the istack) One issue arises with pushing inlines when the tag is already pushed.
store char c as UTF-8 encoded byte stream.
checker for "scope" attribute.
checker for "scroll" attribute.
already seen end body tag?
already seen end html tag?
alt-text
- default text for alt attribute.
ascii-chars
- convert quotes and dashes to nearest ASCII char.
break-before-br - output newline before <br>.
split
- create slides on each h2 element.
Setter for the current configuration instance.
Sets the configuration from a configuration file.
Sets the configuration from a properties object.
doctype
- user specified doctype.
drop-empty-paras
- discard empty p elements.
drop-font-tags
- discard presentation tags.
drop-proprietary-attributes
- discard proprietary attributes.
gnu-emacs
- if true format error output for GNU Emacs.
enclose-block-text
- if true text in blocks is wrapped in <p>'s.
enclose-text
- if true text at body is wrapped in <p>'s.
Errfile - file name to write errors to.
escape-cdata
- replace CDATA sections with escaped text.
Sets the current file name.
fix-backslash
- fix URLs by replacing \ with /.
fix-bad-comments
- fix comments with adjacent hyphens.
fix-uri
- fix uri references applying URI encoding if necessary.
force-output
- output document even if errors were found.
hide-comments
- hides all (real) comments in output.
hide-endtags - suppress optional end tags.
Setter for inCharEncoding
.
Setter for inCharEncodingName
.
indent-attributes
- newline+indent before each attribute.
indent-cdata
- indent CDATA sections.
indent - indent content of appropriate tags.
Setter for inOutCharEncodingName
.
input-encoding
the character encoding used for input.
InputStreamName - the name of the input stream (printed in the header information).
join-classes
- join multiple class attributes.
join-styles
- join multiple style attributes.
keep-time
- if true last modified time is preserved.
Setter for lexer instance (needed for error reporting).
Is this a literal (unmodifiable) attribute?
literal-attributes
- if true attributes may use newlines.
logical-emphasis
- replace i by em and b by strong.
lower-literals
- folds known attribute values to lower case.
make-bare - remove Microsoft cruft.
make-clean - remove presentational clutter.
Attach a TidyMessageListener which will be notified for messages and errors.
Not supported, returns DOMException.NOT_SUPPORTED_ERR
.
Don't wrap this attribute?
numeric-entities
- output entities other than the built-in HTML entities in the numeric rather
than the named entity form.
only-errors - if true normal output is suppressed.
Setter for outCharEncoding
.
Setter for outCharEncodingName
.
output-encoding
the character encoding used for output.
print-body-only
- output BODY content only.
quiet - no 'Parsing X', guessed DTD or summary.
quote-ampersand
- output naked ampersand as &.
quote-marks
- output " marks as ".
quote-nbsp
- output non-breaking space as entity.
output-raw
- avoid mapping values > 127 to entities.
repeated-attributes
- keep first or last duplicate attribute.
replace-color
- replace hex color attribute values with names.
show-errors
- set the number of errors to put out.
show-warnings - show warnings? (errors are always shown).
SmartIndent - does text/block level content effect indentation.
indent-spaces
- default indentation.
tab-size
- tab size in chars.
tidy-mark
- add meta element indicating tidied doc.
trim-empty-elements
- trim empty elements.
uppercase-attributes - output attributes in upper case.
uppercase-tags - output tags in upper case.
word-2000
- draconian cleaning for Word2000.
wrap-asp
- wrap within ASP pseudo elements.
wrap-attributes
- wrap within attribute values.
wrap-jste
- wrap within JSTE pseudo elements.
wrap
- default wrap margin.
wrap-php
- wrap within PHP pseudo elements.
wrap-script-literals
- wrap within JavaScript string literals.
wrap-sections
- wrap within <![ ...
writeback - if true then output tidied markup.
output-xhtml - output extensible HTML.
Adds a new xhtml doctype to the document.
output-xml - create output as XML.
add-xml-pi
- add <?xml?> for XML docs.
assume-xml-procins
This option specifies if Tidy should change the parsing of processing
instructions to require ?> as the terminator rather than >.
add-xml-space
- if set to yes adds xml:space attr as needed.
input-xml - treat input as XML.
checker for "shape" attribute.
character encoding = SHIFTJIS.
number of errors to put out.
print version information.
however errors are always shown.
does text/block level content effect indentation.
space preceding xml declaration.
start of span onto text array.
state of lexer's finite state machine.
StreamIn Implementation using java writers.
Instantiates a new StreamInJavaImpl.
Instantiates a new StreamInJavaImpl.
Word2000 uses span excessively, so we strip span out.
Linked list of class names and styles.
Instantiates a new style.
Linked list of style properties.
Instantiates a new style property.
used for cleaning up presentation markup.
tag's dictionary definition.
tag - field in class org.w3c.tidy.
Node tag's dictionary definition.
a proprietary tag added by Tidy, along with tag_nobr, tag_wbr.
Check HTML attributes implementation.
Tag dictionary node hash table.
Instantiates a new tag table with known tags.
types of tags that the user can define: block tag.
types of tags that the user can define: empty tag.
types of tags that the user can define: inline tag.
types of tags that the user can define: pre tag.
checker for "target" attribute.
checker for text attributes.
checker for "dir" attribute.
checker for table "frame" attribute.
HTML parser and pretty printer.
Instantiates a new Tidy instance.
add meta element indicating tidied doc.
Message sent to listeners for validation errors/warnings and info.
Instantiates a new message.
Listener interface for validation errors/warnings and info.
Utility class with handy methods, mainly for String handling or for reproducing c behaviours.
Convert a Java character encoding name to its IANA equivalent.
Converts an encoding name to the standard java name.
Maps the given character to its lowercase equivalent.
Maps the given character to its uppercase equivalent.
This maps <p> hello <em> world </em>
to <p> hello <em> world </em>
.
Move initial and trailing space out.
This maps hello world to hello world .
checker for table "rules" attribute.
TagTable associated with this Configuration.
checker for "type" attribute.
TextNode, StartTag, EndTag etc.
invalid entity: unescaped ampersand.
attribute: unexpected end of file.
attribute: expected equalsign.
attribute: unexpected gt.
attribute: unexpected quotemark.
the default (big-endian) UNICODE BOM.
the big-endian (default) UNICODE BOM.
the little-endian UNICODE BOM.
attribute: unknown attribute.
invalid entity: unknown entity.
Prints the "unknown file" message.
Prints the "unknown option" message.
Prints an "unknown option" error message.
Update oldtextarray
in the current nodes.
output attributes in upper not lower case.
output tags in upper not lower case.
checker for attributes which contain a list of urls.
presentation flaw: using body.
presentation flaw: using font.
accessibility flaw: using frames.
presentation flaw: using layer.
presentation flaw: using nobr.
accessibility flaw: using noframes.
presentation flaw: using spacer.
character encoding = UTF16.
UTF-16 surrogate pair areas: high surrogates begin.
UTF-16 surrogate pair areas: high surrogates end.
UTF-16 surrogate pair areas: low surrogates begin.
UTF-16 surrogate pair areas: low surrogates end.
character encoding = UTF16BE.
character encoding = UTF16LE.
character encoding = UTF8.
Validates task parameters.
Instantiates a new ValidUTF8Sequence.
checker for "valign" attribute.
character encoding: vendor specific chars.
tags/attrs in any version.
Version: html 4.0 frameset.
tags/attrs in all versions from HTML 3.2 onwards.
tags/attrs in HTML4 but not in earlier version.
Version: html 4.0 transitional.
Version: html 4.0 strict.
tags/attrs in HTML 4 loose and frameset.
tags/attrs which are in all versions of HTML except strict.
all tags and attributes are ok in proprietary version of HTML.
Version in which this tag is defined.
bit vector of HTML versions.
checker for "vtype" attribute.