| |||||||||||
| |||||||||||
Description | |||||||||||
Compound arrows for reading an XML/HTML document or an XML/HTML string | |||||||||||
Synopsis | |||||||||||
| |||||||||||
Documentation | |||||||||||
| |||||||||||
the main document input filter this filter can be configured by a list of configuration options, a value of type Text.XML.HXT.XmlState.TypeDefs.SysConfig for all available options see module Text.XML.HXT.XmlState.SystemConfig
examples: readDocument [] "test.xml" reads and validates a document "test.xml", no namespace propagation, only canonicalization is performed ... import Text.XML.HXT.Curl ... readDocument [ withValidate no , withInputEncoding isoLatin1 , withParseByMimeType yes , withCurl [] ] \"http:\/\/localhost\/test.php\" reads document "test.php", parses it as HTML or XML depending on the mimetype given from the server, but without validation, default encoding isoLatin1. HTTP access is done via libCurl. readDocument [ withParseHTML yes , withInputEncoding isoLatin1 ] "" reads a HTML document from standard input, no validation is done when parsing HTML, default encoding is isoLatin1, readDocument [ withInputEncoding isoLatin1 , withValidate no , withMimeTypeFile "/etc/mime.types" , withStrictInput yes ] "test.svg" reads an SVG document from "test.svg", sets the mime type by looking in the system mimetype config file, default encoding is isoLatin1, ... import Text.XML.HXT.Curl import Text.XML.HXT.TagSoup ... readDocument [ withParseHTML yes , withTagSoup , withProxy "www-cache:3128" , withCurl [] , withWarnings no ] "http://www.haskell.org/" reads Haskell homepage with HTML parser, ignoring any warnings (at the time of writing, there were some HTML errors), with http access via libCurl interface and proxy "www-cache" at port 3128, parsing is done with tagsoup HTML parser. This requires packages "hxt-curl" and "hxt-tagsoup" to be installed readDocument [ withValidate yes , withCheckNamespaces yes , withRemoveWS yes , withTrace 2 , withHTTP [] ] "http://www.w3c.org/" read w3c home page (xhtml), validate and check namespaces, remove whitespace between tags, trace activities with level 2. HTTP access is done with Haskell HTTP package for minimal complete examples see Text.XML.HXT.Arrow.WriteDocument.writeDocument and runX, the main starting point for running an XML arrow. | |||||||||||
| |||||||||||
the arrow version of readDocument, the arrow input is the source URI | |||||||||||
| |||||||||||
read a document that is stored in a normal Haskell String the same function as readDocument, but the parameter forms the input. All options available for readDocument are applicable for readString. Default encoding: No encoding is done, the String argument is taken as Unicode string | |||||||||||
| |||||||||||
the arrow version of readString, the arrow input is the source URI | |||||||||||
| |||||||||||
parse a string as HTML content, substitute all HTML entity refs and canonicalize tree (substitute char refs, ...). Errors are ignored. A simpler version of readFromString but with less functionality. Does not run in the IO monad | |||||||||||
| |||||||||||
parse a string as XML content, substitute all predefined XML entity refs and canonicalize tree (substitute char refs, ...) | |||||||||||
Produced by Haddock version 2.6.1 |