org.ccil.cowan.tagsoup

Class Parser

Implemented Interfaces:
LexicalHandler, ScanHandler, XMLReader

public class Parser
extends DefaultHandler
implements ScanHandler, XMLReader, LexicalHandler

The SAX parser class.

Field Summary

static String
CDATAElementsFeature
A value of "true" indicates that the parser will treat CDATA elements specially.
static String
XML11Feature
Returns "true" if the parser supports both XML 1.1 and XML 1.0.
static String
autoDetectorProperty
Specifies the AutoDetector (for encoding detection) this Parser uses.
static String
bogonsEmptyFeature
A value of "true" indicates that the parser will give unknown elements a content model of EMPTY; a value of "false", a content model of ANY.
static String
defaultAttributesFeature
A value of "true" indicates that the parser will return default attribute values for missing attributes that have default values.
static String
externalGeneralEntitiesFeature
Reports whether this parser processes external general entities (it doesn't).
static String
externalParameterEntitiesFeature
Reports whether this parser processes external parameter entities (it doesn't).
static String
ignorableWhitespaceFeature
A value of "true" indicates that the parser will transmit whitespace in element-only content via the SAX ignorableWhitespace callback.
static String
ignoreBogonsFeature
A value of "true" indicates that the parser will ignore unknown elements.
static String
isStandaloneFeature
May be examined only during a parse, after the startDocument() callback has been completed; read-only.
static String
lexicalHandlerParameterEntitiesFeature
A value of "true" indicates that the LexicalHandler will report the beginning and end of parameter entities (it won't).
static String
lexicalHandlerProperty
Used to see some syntax events that are essential in some applications: comments, CDATA delimiters, selected general entity inclusions, and the start and end of the DTD (and declaration of document element name).
static String
namespacePrefixesFeature
A value of "true" indicates that XML qualified names (with prefixes) and attributes (including xmlns* attributes) will be available.
static String
namespacesFeature
A value of "true" indicates namespace URIs and unprefixed local names for element and attribute names will be available.
static String
resolveDTDURIsFeature
A value of "true" indicates that system IDs in declarations will be absolutized (relative to their base URIs) before reporting.
static String
restartElementsFeature
A value of "true" indicates that the parser will attempt to restart the restartable elements.
static String
scannerProperty
Specifies the Scanner object this Parser uses.
static String
schemaProperty
Specifies the Schema object this Parser uses.
static String
stringInterningFeature
Has a value of "true" if all XML names (for elements, prefixes, attributes, entities, notations, and local names), as well as Namespace URIs, will have been interned using java.lang.String.intern.
static String
translateColonsFeature
A value of "true" indicates that the parser will translate colons into underscores in names.
static String
unicodeNormalizationCheckingFeature
Controls whether the parser reports Unicode normalization errors as described in section 2.13 and Appendix B of the XML 1.1 Recommendation.
static String
useAttributes2Feature
Returns "true" if the Attributes objects passed by this parser in ContentHandler.startElement() implement the org.xml.sax.ext.Attributes2 interface.
static String
useEntityResolver2Feature
Returns "true" if, when setEntityResolver is given an object implementing the org.xml.sax.ext.EntityResolver2 interface, those new methods will be used.
static String
useLocator2Feature
Returns "true" if the Locator objects passed by this parser in ContentHandler.setDocumentLocator() implement the org.xml.sax.ext.Locator2 interface.
static String
validationFeature
Controls whether the parser is reporting all validity errors (We don't report any validity errors.)
static String
xmlnsURIsFeature
Controls whether, when the namespace-prefixes feature is set, the parser treats namespace declaration attributes as being in the http://www.w3.org/2000/xmlns/ namespace.

Method Summary

void
adup(char[] buff, int offset, int length)
void
aname(char[] buff, int offset, int length)
void
aval(char[] buff, int offset, int length)
void
cdsect(char[] buff, int offset, int length)
void
cmnt(char[] buff, int offset, int length)
void
comment(char[] ch, int start, int length)
void
decl(char[] buff, int offset, int length)
Parsing the complete XML Document Type Definition is way too complex, but for many simple cases we can extract something useful from it.
void
endCDATA()
void
endDTD()
void
endEntity(String name)
void
entity(char[] buff, int offset, int length)
void
eof(char[] buff, int offset, int length)
void
etag(char[] buff, int offset, int length)
void
etag_basic(char[] buff, int offset, int length)
boolean
etag_cdata(char[] buff, int offset, int length)
ContentHandler
getContentHandler()
DTDHandler
getDTDHandler()
char
getEntity()
EntityResolver
getEntityResolver()
ErrorHandler
getErrorHandler()
boolean
getFeature(String name)
Object
getProperty(String name)
void
gi(char[] buff, int offset, int length)
void
parse(InputSource input)
void
parse(String systemid)
void
pcdata(char[] buff, int offset, int length)
void
pi(char[] buff, int offset, int length)
void
pitarget(char[] buff, int offset, int length)
void
setContentHandler(ContentHandler handler)
void
setDTDHandler(DTDHandler handler)
void
setEntityResolver(EntityResolver resolver)
void
setErrorHandler(ErrorHandler handler)
void
setFeature(String name, boolean value)
void
setProperty(String name, Object value)
void
stagc(char[] buff, int offset, int length)
void
stage(char[] buff, int offset, int length)
void
startCDATA()
void
startDTD(String name, String publicid, String systemid)
void
startEntity(String name)

Field Details

CDATAElementsFeature

public static final String CDATAElementsFeature
A value of "true" indicates that the parser will treat CDATA elements specially. Normally true, since the input is by default HTML.

XML11Feature

public static final String XML11Feature
Returns "true" if the parser supports both XML 1.1 and XML 1.0. (Always false.)

autoDetectorProperty

public static final String autoDetectorProperty
Specifies the AutoDetector (for encoding detection) this Parser uses.

bogonsEmptyFeature

public static final String bogonsEmptyFeature
A value of "true" indicates that the parser will give unknown elements a content model of EMPTY; a value of "false", a content model of ANY.

defaultAttributesFeature

public static final String defaultAttributesFeature
A value of "true" indicates that the parser will return default attribute values for missing attributes that have default values.

externalGeneralEntitiesFeature

public static final String externalGeneralEntitiesFeature
Reports whether this parser processes external general entities (it doesn't).

externalParameterEntitiesFeature

public static final String externalParameterEntitiesFeature
Reports whether this parser processes external parameter entities (it doesn't).

ignorableWhitespaceFeature

public static final String ignorableWhitespaceFeature
A value of "true" indicates that the parser will transmit whitespace in element-only content via the SAX ignorableWhitespace callback. Normally this is not done, because HTML is an SGML application and SGML suppresses such whitespace.

ignoreBogonsFeature

public static final String ignoreBogonsFeature
A value of "true" indicates that the parser will ignore unknown elements.

isStandaloneFeature

public static final String isStandaloneFeature
May be examined only during a parse, after the startDocument() callback has been completed; read-only. The value is true if the document specified standalone="yes" in its XML declaration, and otherwise is false. (It's always false.)

lexicalHandlerParameterEntitiesFeature

public static final String lexicalHandlerParameterEntitiesFeature
A value of "true" indicates that the LexicalHandler will report the beginning and end of parameter entities (it won't).

lexicalHandlerProperty

public static final String lexicalHandlerProperty
Used to see some syntax events that are essential in some applications: comments, CDATA delimiters, selected general entity inclusions, and the start and end of the DTD (and declaration of document element name). The Object must implement org.xml.sax.ext.LexicalHandler.

namespacePrefixesFeature

public static final String namespacePrefixesFeature
A value of "true" indicates that XML qualified names (with prefixes) and attributes (including xmlns* attributes) will be available. We don't support this value.

namespacesFeature

public static final String namespacesFeature
A value of "true" indicates namespace URIs and unprefixed local names for element and attribute names will be available.

resolveDTDURIsFeature

public static final String resolveDTDURIsFeature
A value of "true" indicates that system IDs in declarations will be absolutized (relative to their base URIs) before reporting. (This returns true but doesn't actually do anything.)

restartElementsFeature

public static final String restartElementsFeature
A value of "true" indicates that the parser will attempt to restart the restartable elements.

scannerProperty

public static final String scannerProperty
Specifies the Scanner object this Parser uses.

schemaProperty

public static final String schemaProperty
Specifies the Schema object this Parser uses.

stringInterningFeature

public static final String stringInterningFeature
Has a value of "true" if all XML names (for elements, prefixes, attributes, entities, notations, and local names), as well as Namespace URIs, will have been interned using java.lang.String.intern. This supports fast testing of equality/inequality against string constants, rather than forcing slower calls to String.equals(). (We always intern.)

translateColonsFeature

public static final String translateColonsFeature
A value of "true" indicates that the parser will translate colons into underscores in names.

unicodeNormalizationCheckingFeature

public static final String unicodeNormalizationCheckingFeature
Controls whether the parser reports Unicode normalization errors as described in section 2.13 and Appendix B of the XML 1.1 Recommendation. (We don't normalize.)

useAttributes2Feature

public static final String useAttributes2Feature
Returns "true" if the Attributes objects passed by this parser in ContentHandler.startElement() implement the org.xml.sax.ext.Attributes2 interface. (They don't.)

useEntityResolver2Feature

public static final String useEntityResolver2Feature
Returns "true" if, when setEntityResolver is given an object implementing the org.xml.sax.ext.EntityResolver2 interface, those new methods will be used. (They won't be.)

useLocator2Feature

public static final String useLocator2Feature
Returns "true" if the Locator objects passed by this parser in ContentHandler.setDocumentLocator() implement the org.xml.sax.ext.Locator2 interface. (They don't.)

validationFeature

public static final String validationFeature
Controls whether the parser is reporting all validity errors (We don't report any validity errors.)

xmlnsURIsFeature

public static final String xmlnsURIsFeature
Controls whether, when the namespace-prefixes feature is set, the parser treats namespace declaration attributes as being in the http://www.w3.org/2000/xmlns/ namespace. (It doesn't.)

Method Details

adup

public void adup(char[] buff,
                 int offset,
                 int length)
            throws SAXException
Specified by:
adup in interface ScanHandler

aname

public void aname(char[] buff,
                  int offset,
                  int length)
            throws SAXException
Specified by:
aname in interface ScanHandler

aval

public void aval(char[] buff,
                 int offset,
                 int length)
            throws SAXException
Specified by:
aval in interface ScanHandler

cdsect

public void cdsect(char[] buff,
                   int offset,
                   int length)
            throws SAXException
Specified by:
cdsect in interface ScanHandler

cmnt

public void cmnt(char[] buff,
                 int offset,
                 int length)
            throws SAXException
Specified by:
cmnt in interface ScanHandler

comment

public void comment(char[] ch,
                    int start,
                    int length)
            throws SAXException

decl

public void decl(char[] buff,
                 int offset,
                 int length)
            throws SAXException
Parsing the complete XML Document Type Definition is way too complex, but for many simple cases we can extract something useful from it. doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S? ('[' intSubset ']' S?)? '>' DeclSep ::= PEReference | S intSubset ::= (markupdecl | DeclSep)* markupdecl ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral
Specified by:
decl in interface ScanHandler

endCDATA

public void endCDATA()
            throws SAXException

endDTD

public void endDTD()
            throws SAXException

endEntity

public void endEntity(String name)
            throws SAXException

entity

public void entity(char[] buff,
                   int offset,
                   int length)
            throws SAXException
Specified by:
entity in interface ScanHandler

eof

public void eof(char[] buff,
                int offset,
                int length)
            throws SAXException
Specified by:
eof in interface ScanHandler

etag

public void etag(char[] buff,
                 int offset,
                 int length)
            throws SAXException
Specified by:
etag in interface ScanHandler

etag_basic

public void etag_basic(char[] buff,
                       int offset,
                       int length)
            throws SAXException

etag_cdata

public boolean etag_cdata(char[] buff,
                          int offset,
                          int length)
            throws SAXException

getContentHandler

public ContentHandler getContentHandler()

getDTDHandler

public DTDHandler getDTDHandler()

getEntity

public char getEntity()
Specified by:
getEntity in interface ScanHandler

getEntityResolver

public EntityResolver getEntityResolver()

getErrorHandler

public ErrorHandler getErrorHandler()

getFeature

public boolean getFeature(String name)
            throws SAXNotRecognizedException,
                   SAXNotSupportedException

getProperty

public Object getProperty(String name)
            throws SAXNotRecognizedException,
                   SAXNotSupportedException

gi

public void gi(char[] buff,
               int offset,
               int length)
            throws SAXException
Specified by:
gi in interface ScanHandler

parse

public void parse(InputSource input)
            throws IOException,
                   SAXException

parse

public void parse(String systemid)
            throws IOException,
                   SAXException

pcdata

public void pcdata(char[] buff,
                   int offset,
                   int length)
            throws SAXException
Specified by:
pcdata in interface ScanHandler

pi

public void pi(char[] buff,
               int offset,
               int length)
            throws SAXException
Specified by:
pi in interface ScanHandler

pitarget

public void pitarget(char[] buff,
                     int offset,
                     int length)
            throws SAXException
Specified by:
pitarget in interface ScanHandler

setContentHandler

public void setContentHandler(ContentHandler handler)

setDTDHandler

public void setDTDHandler(DTDHandler handler)

setEntityResolver

public void setEntityResolver(EntityResolver resolver)

setErrorHandler

public void setErrorHandler(ErrorHandler handler)

setFeature

public void setFeature(String name,
                       boolean value)
            throws SAXNotRecognizedException,
                   SAXNotSupportedException

setProperty

public void setProperty(String name,
                        Object value)
            throws SAXNotRecognizedException,
                   SAXNotSupportedException

stagc

public void stagc(char[] buff,
                  int offset,
                  int length)
            throws SAXException
Specified by:
stagc in interface ScanHandler

stage

public void stage(char[] buff,
                  int offset,
                  int length)
            throws SAXException
Specified by:
stage in interface ScanHandler

startCDATA

public void startCDATA()
            throws SAXException

startDTD

public void startDTD(String name,
                     String publicid,
                     String systemid)
            throws SAXException

startEntity

public void startEntity(String name)
            throws SAXException

Licence: Academic Free License 3.0 and/or GPL 2.0