org.dom4j.io

Class HTMLWriter

Implemented Interfaces:
LexicalHandler

public class HTMLWriter
extends XMLWriter

HTMLWriter takes a DOM4J tree and formats it to a stream as HTML. This formatter is similar to XMLWriter but it outputs the text of CDATA and Entity sections rather than the serialised format as in XML, it has an XHTML mode, it retains whitespace in certain elements such as <PRE>, and it supports certain elements which have no corresponding close tag such as for <BR> and <P>.

The OutputFormat passed in to the constructor is checked for isXHTML() and isExpandEmptyElements(). See OutputFormatfor details. Here are the rules for this class based on an OutputFormat, "format", passed in to the constructor:

  • If format.isXHTML(), all elements must have either a close element, or be a closed single tag.
  • If format.isExpandEmptyElements()() is true, all elements are expanded except as above.
  • Examples

    If isXHTML == true, CDATA sections look like this:

     
     <myelement><![CDATA[My data]]></myelement> 
     
     
    Otherwise, they look like this:
     
     <myelement>My data</myelement> 
     
     

    Basically, OutputFormat.isXHTML() == true will produce valid XML, while format.isExpandEmptyElements() determines whether empty elements are expanded if isXHTML is true, excepting the special HTML single tags.

    Also, HTMLWriter handles tags whose contents should be preformatted, that is, whitespace-preserved. By default, this set includes the tags <PRE>, <SCRIPT>, <STYLE>, and <TEXTAREA>, case insensitively. It does not include <IFRAME>. Other tags, such as <CODE>, <KBD>, <TT>, <VAR>, are usually rendered in a different font in most browsers, but don't preserve whitespace, so they also don't appear in the default list. HTML Comments are always whitespace-preserved. However, the parser you use may store comments with linefeed-only text nodes (\n) even if your platform uses another line.separator character, and HTMLWriter outputs Comment nodes exactly as the DOM is set up by the parser. See examples and discussion here:

    Examples

    Pretty Printing

    This example shows how to pretty print a string containing a valid HTML document to a string. You can also just call the static methods of this class:
    prettyPrintHTML(String)or
    prettyPrintHTML(String,boolean,boolean,boolean,boolean) or,
    prettyPrintXHTML(String)for XHTML (note the X)

     String testPrettyPrint(String html) {
         StringWriter sw = new StringWriter();
         OutputFormat format = OutputFormat.createPrettyPrint();
         // These are the default values for createPrettyPrint,
         // so you needn't set them:
         // format.setNewlines(true);
         // format.setTrimText(true);</font>
         format.setXHTML(true);
         HTMLWriter writer = new HTMLWriter(sw, format);
         Document document = DocumentHelper.parseText(html);
         writer.write(document);
         writer.flush();
         return sw.toString();
     }
     

    This example shows how to create a "squeezed" document, but one that will work in browsers even if the browser line length is limited. No newlines are included, no extra whitespace at all, except where it it required by setPreformattedTags.

     String testCrunch(String html) {
         StringWriter sw = new StringWriter();
         OutputFormat format = OutputFormat.createPrettyPrint();
         format.setNewlines(false);
         format.setTrimText(true);
         format.setIndent("");
         format.setXHTML(true);
         format.setExpandEmptyElements(false);
         format.setNewLineAfterNTags(20);
         org.dom4j.io.HTMLWriter writer = new HTMLWriter(sw, format);
         org.dom4j.Document document = DocumentHelper.parseText(html);
         writer.write(document);
         writer.flush();
         return sw.toString();
     }
     
    Version:
    $Revision: 1.21 $
    Authors:
    James Strachan
    Laramie Crocker

    Field Summary

    protected static OutputFormat
    DEFAULT_HTML_FORMAT
    protected static HashSet
    DEFAULT_PREFORMATTED_TAGS

    Fields inherited from class org.dom4j.io.XMLWriter

    DEFAULT_FORMAT, LEXICAL_HANDLER_NAMES, lastOutputNodeType, preserve, writer

    Constructor Summary

    HTMLWriter()
    HTMLWriter(OutputStream out)
    HTMLWriter(OutputStream out, OutputFormat format)
    HTMLWriter(Writer writer)
    HTMLWriter(Writer writer, OutputFormat format)
    HTMLWriter(OutputFormat format)

    Method Summary

    void
    endCDATA()
    Set
    getOmitElementCloseSet()
    A clone of the Set of elements that can have their close-tags omitted.
    Set
    getPreformattedTags()
    boolean
    isPreformattedTag(String qualifiedName)
    DOCUMENT ME!
    protected void
    loadOmitElementCloseSet(Set set)
    protected boolean
    omitElementClose(String qualifiedName)
    static String
    prettyPrintHTML(String html)
    Convenience method to just get a String result.
    static String
    prettyPrintHTML(String html, boolean newlines, boolean trim, boolean isXHTML, boolean expandEmpty)
    DOCUMENT ME!
    static String
    prettyPrintXHTML(String html)
    Convenience method to just get a String result, but As XHTML .
    void
    setOmitElementCloseSet(Set newSet)
    To use the empty set, pass an empty Set, or null:
     
     
           setOmitElementCloseSet(new HashSet());
         or
           setOmitElementCloseSet(null);
     
      
     
    void
    setPreformattedTags(Set newSet)
    Override the default set, which includes PRE, SCRIPT, STYLE, and TEXTAREA, case insensitively.
    void
    startCDATA()
    protected void
    writeCDATA(String text)
    protected void
    writeClose(String qualifiedName)
    Overriden method to not close certain element names to avoid wierd behaviour from browsers for versions up to 5.x
    protected void
    writeDeclaration()
    protected void
    writeElement(Element element)
    This override handles any elements that should not remove whitespace, such as <PRE>, <SCRIPT>, <STYLE>, and <TEXTAREA>.
    protected void
    writeEmptyElementClose(String qualifiedName)
    protected void
    writeEntity(Entity entity)
    protected void
    writeString(String text)

    Methods inherited from class org.dom4j.io.XMLWriter

    characters, close, comment, createWriter, defaultMaximumAllowedCharacter, endCDATA, endDTD, endDocument, endElement, endEntity, endPrefixMapping, escapeAttributeEntities, escapeElementEntities, flush, getLexicalHandler, getMaximumAllowedCharacter, getOutputFormat, getProperty, handleException, ignorableWhitespace, indent, installLexicalHandler, isElementSpacePreserved, isEscapeText, isExpandEmptyElements, isNamespaceDeclaration, notationDecl, parse, println, processingInstruction, resolveEntityRefs, setDocumentLocator, setEscapeText, setIndentLevel, setLexicalHandler, setMaximumAllowedCharacter, setOutputStream, setProperty, setResolveEntityRefs, setWriter, shouldEncodeChar, startCDATA, startDTD, startDocument, startElement, startEntity, startPrefixMapping, unparsedEntityDecl, write, write, write, write, write, write, write, write, write, write, write, write, write, writeAttribute, writeAttribute, writeAttributes, writeAttributes, writeCDATA, writeClose, writeClose, writeComment, writeDeclaration, writeDocType, writeDocType, writeElement, writeElementContent, writeEmptyElementClose, writeEntity, writeEntityRef, writeEscapeAttributeEntities, writeNamespace, writeNamespace, writeNamespaces, writeNode, writeNodeText, writeOpen, writePrintln, writeProcessingInstruction, writeString

    Field Details

    DEFAULT_HTML_FORMAT

    protected static final OutputFormat DEFAULT_HTML_FORMAT

    DEFAULT_PREFORMATTED_TAGS

    protected static final HashSet DEFAULT_PREFORMATTED_TAGS

    Constructor Details

    HTMLWriter

    public HTMLWriter()
                throws UnsupportedEncodingException

    HTMLWriter

    public HTMLWriter(OutputStream out)
                throws UnsupportedEncodingException

    HTMLWriter

    public HTMLWriter(OutputStream out,
                      OutputFormat format)
                throws UnsupportedEncodingException

    HTMLWriter

    public HTMLWriter(Writer writer)

    HTMLWriter

    public HTMLWriter(Writer writer,
                      OutputFormat format)

    HTMLWriter

    public HTMLWriter(OutputFormat format)
                throws UnsupportedEncodingException

    Method Details

    endCDATA

    public void endCDATA()
                throws SAXException
    Overrides:
    endCDATA in interface XMLWriter

    getOmitElementCloseSet

    public Set getOmitElementCloseSet()
    A clone of the Set of elements that can have their close-tags omitted. By default it should be "AREA", "BASE", "BR", "COL", "HR", "IMG", "INPUT", "LINK", "META", "P", "PARAM"
    Returns:
    A clone of the Set.

    getPreformattedTags

    public Set getPreformattedTags()
    See Also:
    setPreformattedTags

    isPreformattedTag

    public boolean isPreformattedTag(String qualifiedName)
    DOCUMENT ME!
    Parameters:
    qualifiedName - DOCUMENT ME!
    Returns:
    true if the qualifiedName passed in matched (case-insensitively) a tag in the preformattedTags set, or false if not found or if the set is empty or null.
    See Also:
    setPreformattedTags

    loadOmitElementCloseSet

    protected void loadOmitElementCloseSet(Set set)

    omitElementClose

    protected boolean omitElementClose(String qualifiedName)

    prettyPrintHTML

    public static String prettyPrintHTML(String html)
                throws IOException,
                       UnsupportedEncodingException,
                       DocumentException
    Convenience method to just get a String result.
    Parameters:
    html - DOCUMENT ME!
    Returns:
    a pretty printed String from the source string, preserving whitespace in the defaultPreformattedTags set, and leaving the close tags off of the default omitElementCloseSet set. Use one of the write methods if you want stream output.

    prettyPrintHTML

    public static String prettyPrintHTML(String html,
                                         boolean newlines,
                                         boolean trim,
                                         boolean isXHTML,
                                         boolean expandEmpty)
                throws IOException,
                       UnsupportedEncodingException,
                       DocumentException
    DOCUMENT ME!
    Parameters:
    html - DOCUMENT ME!
    newlines - DOCUMENT ME!
    trim - DOCUMENT ME!
    isXHTML - DOCUMENT ME!
    expandEmpty - DOCUMENT ME!
    Returns:
    a pretty printed String from the source string, preserving whitespace in the defaultPreformattedTags set, and leaving the close tags off of the default omitElementCloseSet set. This override allows you to specify various formatter options. Use one of the write methods if you want stream output.

    prettyPrintXHTML

    public static String prettyPrintXHTML(String html)
                throws IOException,
                       UnsupportedEncodingException,
                       DocumentException
    Convenience method to just get a String result, but As XHTML .
    Parameters:
    html - DOCUMENT ME!
    Returns:
    a pretty printed String from the source string, preserving whitespace in the defaultPreformattedTags set, but conforming to XHTML: no close tags are omitted (though if empty, they will be converted to XHTML empty tags: <HR/> Use one of the write methods if you want stream output.

    setOmitElementCloseSet

    public void setOmitElementCloseSet(Set newSet)
    To use the empty set, pass an empty Set, or null:
     
     
           setOmitElementCloseSet(new HashSet());
         or
           setOmitElementCloseSet(null);
     
      
     
    Parameters:
    newSet - DOCUMENT ME!

    setPreformattedTags

    public void setPreformattedTags(Set newSet)
    Override the default set, which includes PRE, SCRIPT, STYLE, and TEXTAREA, case insensitively.

    Setting Preformatted Tags

    Pass in a Set of Strings, one for each tag name that should be treated like a PRE tag. You may pass in null or an empty Set to assign the empty set, in which case no tags will be treated as preformatted, except that HTML Comments will continue to be preformatted. If a tag is included in the set of preformatted tags, all whitespace within the tag will be preserved, including whitespace on the same line preceding the close tag. This will generally make the close tag not line up with the start tag, but it preserves the intention of the whitespace within the tag.

    The browser considers leading whitespace before the close tag to be significant, but leading whitespace before the open tag to be insignificant. For example, if the HTML author doesn't put the close TEXTAREA tag flush to the left margin, then the TEXTAREA control in the browser will have spaces on the last line inside the control. This may be the HTML author's intent. Similarly, in a PRE, the browser treats a flushed left close PRE tag as different from a close tag with leading whitespace. Again, this must be left up to the HTML author.

    Examples

    Here is an example of how you can set the PreformattedTags list using setPreformattedTags to include IFRAME, as well as the default set, if you have an instance of this class named myHTMLWriter:

     Set current = myHTMLWriter.getPreformattedTags();
     current.add("IFRAME");
     myHTMLWriter.setPreformattedTags(current);
     
     //The set is now <b>PRE, SCRIPT, STYLE, TEXTAREA, IFRAME</b>
     
     
     
    Similarly, you can simply replace it with your own:
     
     
           HashSet newset = new HashSet();
           newset.add("PRE");
           newset.add("TEXTAREA");
           myHTMLWriter.setPreformattedTags(newset);
     
           //The set is now <b>{PRE, TEXTAREA}</b>
     
      
     
    You can remove all tags from the preformatted tags list, with an empty set, like this:
     
     
           myHTMLWriter.setPreformattedTags(new HashSet());
     
           //The set is now <b>{}</b>
     
      
     
    or with null, like this:
     
     
           myHTMLWriter.setPreformattedTags(null);
     
           //The set is now <b>{}</b>
     
      
     
    Parameters:
    newSet - DOCUMENT ME!

    startCDATA

    public void startCDATA()
                throws SAXException
    Overrides:
    startCDATA in interface XMLWriter

    writeCDATA

    protected void writeCDATA(String text)
                throws IOException
    Overrides:
    writeCDATA in interface XMLWriter

    writeClose

    protected void writeClose(String qualifiedName)
                throws IOException
    Overriden method to not close certain element names to avoid wierd behaviour from browsers for versions up to 5.x
    Overrides:
    writeClose in interface XMLWriter
    Parameters:
    qualifiedName - DOCUMENT ME!

    writeDeclaration

    protected void writeDeclaration()
                throws IOException
    Overrides:
    writeDeclaration in interface XMLWriter

    writeElement

    protected void writeElement(Element element)
                throws IOException
    This override handles any elements that should not remove whitespace, such as <PRE>, <SCRIPT>, <STYLE>, and <TEXTAREA>. Note: the close tags won't line up with the open tag, but we can't alter that. See javadoc note at setPreformattedTags.
    Overrides:
    writeElement in interface XMLWriter
    Parameters:
    element - DOCUMENT ME!
    See Also:
    setPreformattedTags

    writeEmptyElementClose

    protected void writeEmptyElementClose(String qualifiedName)
                throws IOException
    Overrides:
    writeEmptyElementClose in interface XMLWriter

    writeEntity

    protected void writeEntity(Entity entity)
                throws IOException
    Overrides:
    writeEntity in interface XMLWriter

    writeString

    protected void writeString(String text)
                throws IOException
    Overrides:
    writeString in interface XMLWriter

    Copyright B) 2005 MetaStuff Ltd. All Rights Reserved. Hosted by

    SourceForge Logo