org.w3c.tidy

Class Node

Implemented Interfaces:
Cloneable

public class Node
extends java.lang.Object
implements Cloneable

Used for elements and text nodes element name is null for text nodes start and end are offsets into lexbuf which contains the textual content of all elements in the parse tree. Parent and content allow traversal of the parse tree in any direction. attributes are represented as a linked list of AttVal nodes which hold the strings for attribute/value pairs.
Version:
$Revision: 748 $ ($Author: fgiust $)
Authors:
Dave Raggett dsr@w3.org
Andy Quick ac.quick@sympatico.ca (translation to Java)
Fabrizio Giustina

Field Summary

static short
ASP_TAG
node type: asp tag.
static short
CDATA_TAG
node type: CDATA.
static short
COMMENT_TAG
node type: comment.
static short
DOCTYPE_TAG
node type: doctype.
static short
END_TAG
End tag.
static short
JSTE_TAG
node type: jste tag.
static short
PHP_TAG
node type: php tag.
static short
PROC_INS_TAG
node type: .
static short
ROOT_NODE
node type: root.
static short
SECTION_TAG
node type: section tag.
static short
START_END_TAG
Start of an end tag.
static short
START_TAG
Start tag.
static short
TEXT_NODE
node type: text.
static short
XML_DECL
node type: doctype.
protected org.w3c.dom.Node
adapter
DOM adapter.
protected AttVal
attributes
Attribute/Value linked list.
protected boolean
closed
true if closed by explicit end tag.
protected Node
content
Contained node.
protected String
element
Tag name.
protected int
end
end of span onto text array.
protected boolean
implicit
true if inferred.
protected Node
last
last node.
protected boolean
linebreak
true if followed by a line break.
protected Node
next
next node.
protected Node
parent
parent node.
protected Node
prev
pevious node.
protected int
start
start of span onto text array.
protected Dict
tag
tag's dictionary definition.
protected byte[]
textarray
the text array.
protected short
type
TextNode, StartTag, EndTag etc.
protected Dict
was
old tag when it was changed.

Constructor Summary

Node()
Instantiates a new text node.
Node(short type, byte[] textarray, int start, int end)
Instantiates a new node.
Node(short type, byte[] textarray, int start, int end, String element, TagTable tt)
Instantiates a new node.

Method Summary

void
addAttribute(String name, String value)
Adds an attribute to the node.
void
addClass(String classname)
Add a css class to the node.
void
checkAttributes(Lexer lexer)
Default method for checking an element's attributes.
boolean
checkNodeIntegrity()
Checks for node integrity.
protected Object
clone()
Used to clone heading nodes when split by an hr.
protected Node
cloneNode(boolean deep)
Clone this node.
static void
coerceNode(Lexer lexer, Node node, Dict tag)
Coerce a node.
void
discardDocType()
Discard the doctype node.
static Node
discardElement(Node element)
Remove node from markup tree and discard it.
protected static Node
escapeTag(Lexer lexer, Node element)
Escapes the given tag.
boolean
expectsContent()
Does the node expect contents?
Node
findBody(TagTable tt)
Find the body node.
Node
findDocType()
Find the doctype element.
Node
findHEAD(TagTable tt)
Find the head tag.
Node
findHTML(TagTable tt)
Find the "html" element.
static void
fixEmptyRow(Lexer lexer, Node row)
If a table row is empty then insert an empty cell.This practice is consistent with browser behavior and avoids potential problems with row spanning cells.
protected org.w3c.dom.Node
getAdapter()
Returns a DOM Node which wrap the current tidy Node.
AttVal
getAttrByName(String name)
Returns an attribute with the given name in the current node.
boolean
hasOneChild()
Does the node have one (and only one) child?
static void
insertDocType(Lexer lexer, Node element, Node doctype)
The doctype has been found after other tags, and needs moving to before the html element.
static boolean
insertMisc(Node element, Node node)
Insert a node at the end.
void
insertNodeAfterElement(Node node)
Insert node into markup tree after element.
static void
insertNodeAsParent(Node element, Node node)
Insert node into markup tree in pace of element which is moved to become the child of the node.
void
insertNodeAtEnd(Node node)
Insert node into markup tree.
void
insertNodeAtStart(Node node)
Insert a node into markup tree.
static void
insertNodeBeforeElement(Node element, Node node)
Insert node into markup tree before element.
boolean
isBlank(Lexer lexer)
Is the node content empty or blank? Assumes node is a text node.
boolean
isDescendantOf(Dict tag)
Is this node contained in a given tag?
boolean
isElement()
Is the node an element?
boolean
isJavaScript()
Used to check script node for script language.
boolean
isNewNode()
Is this a new (user defined) node? Used to determine how attributes without values should be printed.
static void
moveBeforeTable(Node row, Node node, TagTable tt)
Unexpected content in table row is moved to just before the table in accordance with Netscape and IE.
void
removeAttribute(AttVal attr)
Remove an attribute from node and then free it.
void
removeNode()
Extract this node and its children from a markup tree.
void
repairDuplicateAttributes(Lexer lexer)
The same attribute name can't be used more than once in each element.
protected void
setType(short newType)
Setter for node type.
String
toString()
static void
trimEmptyElement(Lexer lexer, Node element)
Trim an empty element.
static void
trimInitialSpace(Lexer lexer, Node element, Node text)
This maps <p> hello <em> world </em> to <p> hello <em> world </em>.
static void
trimSpaces(Lexer lexer, Node element)
Move initial and trailing space out.
static void
trimTrailingSpace(Lexer lexer, Node element, Node last)
This maps hello world to hello world .

Field Details

ASP_TAG

public static final short ASP_TAG
node type: asp tag.
Field Value:
10

CDATA_TAG

public static final short CDATA_TAG
node type: CDATA.
Field Value:
8

COMMENT_TAG

public static final short COMMENT_TAG
node type: comment.
Field Value:
2

DOCTYPE_TAG

public static final short DOCTYPE_TAG
node type: doctype.
Field Value:
1

END_TAG

public static final short END_TAG
End tag.
Field Value:
6

JSTE_TAG

public static final short JSTE_TAG
node type: jste tag.
Field Value:
11

PHP_TAG

public static final short PHP_TAG
node type: php tag.
Field Value:
12

PROC_INS_TAG

public static final short PROC_INS_TAG
node type: .
Field Value:
3

ROOT_NODE

public static final short ROOT_NODE
node type: root.
Field Value:
0

SECTION_TAG

public static final short SECTION_TAG
node type: section tag.
Field Value:
9

START_END_TAG

public static final short START_END_TAG
Start of an end tag.
Field Value:
7

START_TAG

public static final short START_TAG
Start tag.
Field Value:
5

TEXT_NODE

public static final short TEXT_NODE
node type: text.
Field Value:
4

XML_DECL

public static final short XML_DECL
node type: doctype.
Field Value:
13

adapter

protected org.w3c.dom.Node adapter
DOM adapter.

attributes

protected AttVal attributes
Attribute/Value linked list.

closed

protected boolean closed
true if closed by explicit end tag.

content

protected Node content
Contained node.

element

protected String element
Tag name.

end

protected int end
end of span onto text array.

implicit

protected boolean implicit
true if inferred.

last

protected Node last
last node.

linebreak

protected boolean linebreak
true if followed by a line break.

Node next
next node.

parent

protected Node parent
parent node.

Node prev
pevious node.

start

protected int start
start of span onto text array.

tag

protected Dict tag
tag's dictionary definition.

textarray

protected byte[] textarray
the text array.

type

protected short type
TextNode, StartTag, EndTag etc.

was

protected Dict was
old tag when it was changed.

Constructor Details

Node

public Node()
Instantiates a new text node.

Node

public Node(short type,
            byte[] textarray,
            int start,
            int end)
Instantiates a new node.
Parameters:
type - node type: Node.ROOT_NODE | Node.DOCTYPE_TAG | Node.COMMENT_TAG | Node.PROC_INS_TAG | Node.TEXT_NODE | Node.START_TAG | Node.END_TAG | Node.START_END_TAG | Node.CDATA_TAG | Node.SECTION_TAG | Node. ASP_TAG | Node.JSTE_TAG | Node.PHP_TAG | Node.XML_DECL
textarray - array of bytes contained in the Node
start - start position
end - end position

Node

public Node(short type,
            byte[] textarray,
            int start,
            int end,
            String element,
            TagTable tt)
Instantiates a new node.
Parameters:
type - node type: Node.ROOT_NODE | Node.DOCTYPE_TAG | Node.COMMENT_TAG | Node.PROC_INS_TAG | Node.TEXT_NODE | Node.START_TAG | Node.END_TAG | Node.START_END_TAG | Node.CDATA_TAG | Node.SECTION_TAG | Node. ASP_TAG | Node.JSTE_TAG | Node.PHP_TAG | Node.XML_DECL
textarray - array of bytes contained in the Node
start - start position
end - end position
element - tag name
tt - tag table instance

Method Details

addAttribute

public void addAttribute(String name,
                         String value)
Adds an attribute to the node.
Parameters:
name - attribute name
value - attribute value

addClass

public void addClass(String classname)
Add a css class to the node. If a class attribute already exists adds the value to the existing attribute.
Parameters:
classname - css class name

checkAttributes

public void checkAttributes(Lexer lexer)
Default method for checking an element's attributes.
Parameters:
lexer - Lexer

checkNodeIntegrity

public boolean checkNodeIntegrity()
Checks for node integrity.
Returns:
false if node is not consistent

clone

protected Object clone()
Used to clone heading nodes when split by an hr.
See Also:
java.lang.Object.clone()

cloneNode

protected Node cloneNode(boolean deep)
Clone this node.
Parameters:
deep - if true deep clone the node (also clones all the contained nodes)
Returns:
cloned node

coerceNode

public static void coerceNode(Lexer lexer,
                              Node node,
                              Dict tag)
Coerce a node.
Parameters:
lexer - Lexer
node - Node
tag - tag dictionary reference

discardDocType

public void discardDocType()
Discard the doctype node.

discardElement

public static Node discardElement(Node element)
Remove node from markup tree and discard it.
Parameters:
element - discarded node
Returns:
next node

escapeTag

protected static Node escapeTag(Lexer lexer,
                                Node element)
Escapes the given tag.
Parameters:
lexer - Lexer
element - node to be escaped
Returns:
escaped node

expectsContent

public boolean expectsContent()
Does the node expect contents?
Returns:
false if this node should be empty

findBody

public Node findBody(TagTable tt)
Find the body node.
Parameters:
tt - tag table
Returns:
body node

findDocType

public Node findDocType()
Find the doctype element.
Returns:
doctype node or null if not found

findHEAD

public Node findHEAD(TagTable tt)
Find the head tag.
Parameters:
tt - tag table
Returns:
head node

findHTML

public Node findHTML(TagTable tt)
Find the "html" element.
Parameters:
tt - tag table
Returns:
html node

fixEmptyRow

public static void fixEmptyRow(Lexer lexer,
                               Node row)
If a table row is empty then insert an empty cell.This practice is consistent with browser behavior and avoids potential problems with row spanning cells.
Parameters:
lexer - Lexer
row - row node

getAdapter

protected org.w3c.dom.Node getAdapter()
Returns a DOM Node which wrap the current tidy Node.
Returns:
org.w3c.dom.Node instance

getAttrByName

public AttVal getAttrByName(String name)
Returns an attribute with the given name in the current node.
Parameters:
name - attribute name.
Returns:
AttVal instance or null if no attribute with the iven name is found

hasOneChild

public boolean hasOneChild()
Does the node have one (and only one) child?
Returns:
true if the node has one child

insertDocType

public static void insertDocType(Lexer lexer,
                                 Node element,
                                 Node doctype)
The doctype has been found after other tags, and needs moving to before the html element.
Parameters:
lexer - Lexer
element - document
doctype - doctype node to insert at the beginning of element

insertMisc

public static boolean insertMisc(Node element,
                                 Node node)
Insert a node at the end.
Parameters:
element - parent node
node - will be inserted at the end of element
Returns:
true if the node has been inserted

insertNodeAfterElement

public void insertNodeAfterElement(Node node)
Insert node into markup tree after element.
Parameters:
node - new node to insert

insertNodeAsParent

public static void insertNodeAsParent(Node element,
                                      Node node)
Insert node into markup tree in pace of element which is moved to become the child of the node.
Parameters:
element - child node. Will be inserted as a child of element
node - parent node

insertNodeAtEnd

public void insertNodeAtEnd(Node node)
Insert node into markup tree.
Parameters:
node - Node to insert

insertNodeAtStart

public void insertNodeAtStart(Node node)
Insert a node into markup tree.
Parameters:
node - to insert

insertNodeBeforeElement

public static void insertNodeBeforeElement(Node element,
                                           Node node)
Insert node into markup tree before element.
Parameters:
element - child node. Will be insertedbefore element
node - following node

isBlank

public boolean isBlank(Lexer lexer)
Is the node content empty or blank? Assumes node is a text node.
Parameters:
lexer - Lexer
Returns:
true if the node content empty or blank

isDescendantOf

public boolean isDescendantOf(Dict tag)
Is this node contained in a given tag?
Parameters:
tag - descendant tag
Returns:
true if node is contained in tag

isElement

public boolean isElement()
Is the node an element?
Returns:
true if type is START_TAG | START_END_TAG

isJavaScript

public boolean isJavaScript()
Used to check script node for script language.
Returns:
true if the script node contains javascript

isNewNode

public boolean isNewNode()
Is this a new (user defined) node? Used to determine how attributes without values should be printed. This was introduced to deal with user defined tags e.g. Cold Fusion.
Returns:
true if this node represents a user-defined tag.

moveBeforeTable

public static void moveBeforeTable(Node row,
                                   Node node,
                                   TagTable tt)
Unexpected content in table row is moved to just before the table in accordance with Netscape and IE. This code assumes that node hasn't been inserted into the row.
Parameters:
row - Row node
node - Node which should be moved before the table
tt - tag table

removeAttribute

public void removeAttribute(AttVal attr)
Remove an attribute from node and then free it.
Parameters:
attr - attribute to remove

removeNode

public void removeNode()
Extract this node and its children from a markup tree.

repairDuplicateAttributes

public void repairDuplicateAttributes(Lexer lexer)
The same attribute name can't be used more than once in each element. Discard or join attributes according to configuration.
Parameters:
lexer - Lexer

setType

protected void setType(short newType)
Setter for node type.
Parameters:
newType - a valid node type constant

toString

public String toString()
See Also:
java.lang.Object.toString()

trimEmptyElement

public static void trimEmptyElement(Lexer lexer,
                                    Node element)
Trim an empty element.
Parameters:
lexer - Lexer
element - empty node to be removed

trimInitialSpace

public static void trimInitialSpace(Lexer lexer,
                                    Node element,
                                    Node text)
This maps <p> hello <em> world </em> to <p> hello <em> world </em>. Trims initial space, by moving it before the start tag, or if this element is the first in parent's content, then by discarding the space.
Parameters:
lexer - Lexer
element - parent node
text - text node

trimSpaces

public static void trimSpaces(Lexer lexer,
                              Node element)
Move initial and trailing space out. This routine maps: hello world to hello world and hello world to hello world .
Parameters:
lexer - Lexer
element - Node

trimTrailingSpace

public static void trimTrailingSpace(Lexer lexer,
                                     Node element,
                                     Node last)
This maps hello world to hello world . If last child of element is a text node then trim trailing white space character moving it to after element's end tag.
Parameters:
lexer - Lexer
element - node
last - last child of element