|
|
|
|
|
Description |
This module is for working with HTML/XML. It deals with both well-formed XML and
malformed HTML from the web. It features:
- A lazy parser, based on the HTML 5 specification - see parseTags.
- A renderer that can write out HTML/XML - see renderTags.
- Utilities for extracting information from a document - see ~==, sections and partitions.
The standard practice is to parse a String to [Tag String] using parseTags,
then operate upon it to extract the necessary information.
|
|
Synopsis |
|
|
|
|
Data structures and parsing
|
|
|
A single HTML element. A whole document is represented by a list of Tag.
There is no requirement for TagOpen and TagClose to match.
| Constructors | TagOpen str [Attribute str] | An open tag with Attributes in their original order
| TagClose str | A closing tag
| TagText str | A text node, guaranteed not to be the empty string
| TagComment str | A comment
| TagWarning str | Meta: A syntax error in the input file
| TagPosition !Row !Column | Meta: The position of a parsed element
|
|
|
|
|
The row/line of a position, starting at 1
|
|
|
The column of a position, starting at 1
|
|
type Attribute str = (str, str) | Source |
|
An HTML attribute id="name" generates ("id","name")
|
|
|
Parse a string to a list of tags, using an HTML 5 compliant parser.
parseTags "<hello>my&</world>" == [TagOpen "hello" [],TagText "my&",TagClose "world"]
|
|
|
Parse a string to a list of tags, using settings supplied by the ParseOptions parameter,
eg. to output position information:
parseTagsOptions parseOptions{optTagPosition = True} "<hello>my&</world>" ==
[TagPosition 1 1,TagOpen "hello" [],TagPosition 1 8,TagText "my&",TagPosition 1 15,TagClose "world"]
|
|
|
These options control how parseTags works.
| Constructors | ParseOptions | | optTagPosition :: Bool | Should TagPosition values be given before some items (default=False,fast=False)
| optTagWarning :: Bool | Should TagWarning values be given (default=False,fast=False)
| optEntityData :: (str, Bool) -> [Tag str] | How to lookup an entity (Bool = has ending ';')
| optEntityAttrib :: (str, Bool) -> (str, [Tag str]) | How to lookup an entity in an attribute (Bool = has ending ';'?)
| optTagTextMerge :: Bool | Require no adjacent TagText values (default=True,fast=False)
|
|
|
|
|
|
The default parse options value, described in ParseOptions.
|
|
|
A ParseOptions structure optimised for speed, following the fast options.
|
|
|
Show a list of tags, as they might have been parsed, using the default settings given in
RenderOptions.
renderTags [TagOpen "hello" [],TagText "my&",TagClose "world"] == "<hello>my&</world>"
|
|
|
Show a list of tags using settings supplied by the RenderOptions parameter,
eg. to avoid escaping any characters one could do:
renderTagsOptions renderOptions{optEscape = id} [TagText "my&"] == "my&"
|
|
|
Replace the four characters &"<> with their HTML entities.
|
|
|
These options control how renderTags works.
The strange quirk of only minimizing <br> tags is due to Internet Explorer treating
<br></br> as <br><br>.
| Constructors | RenderOptions | | optEscape :: str -> str | Escape a piece of text (default = escape the four characters &"<>)
| optMinimize :: str -> Bool | Minimise <b></b> -> <b/> (default = minimise only <br> tags)
|
|
|
|
|
|
The default render options value, described in RenderOptions.
|
|
|
Turns all tag names and attributes to lower case and
converts DOCTYPE to upper case.
|
|
Tag identification
|
|
|
Test if a Tag is a TagOpen
|
|
|
Test if a Tag is a TagClose
|
|
|
Test if a Tag is a TagText
|
|
|
Test if a Tag is a TagWarning
|
|
|
Test if a Tag is a TagPosition
|
|
|
Returns True if the Tag is TagOpen and matches the given name
|
|
|
Returns True if the Tag is TagClose and matches the given name
|
|
Extraction
|
|
|
Extract the string from within TagText, crashes if not a TagText
|
|
|
Extract an attribute, crashes if not a TagOpen.
Returns "" if no attribute present.
|
|
|
Extract the string from within TagText, otherwise Nothing
|
|
|
Extract the string from within TagWarning, otherwise Nothing
|
|
|
Extract all text content from tags (similar to Verbatim found in HaXml)
|
|
Utility
|
|
|
This function takes a list, and returns all suffixes whose
first item matches the predicate.
|
|
|
This function is similar to sections, but splits the list
so no element appears in any two partitions.
|
|
Combinators
|
|
|
Define a class to allow String's or Tag str's to be used as matches
|
|
|
|
Performs an inexact match, the first item should be the thing to match.
If the second item is a blank string, that is considered to match anything.
For example:
(TagText "test" ~== TagText "" ) == True
(TagText "test" ~== TagText "test") == True
(TagText "test" ~== TagText "soup") == False
For TagOpen missing attributes on the right are allowed.
|
|
|
Negation of ~==
|
|
Produced by Haddock version 2.6.0 |