Extracted from Pike v7.4 release 341 at 2005-11-30.
pike.ida.liu.se
[Top]
predef::
Parser
Parser.HTML

Method Parser.HTML()->ignore_unknown()


Method ignore_unknown

int case_insensitive_tag(void|int value)
int ignore_tags(void|int value)
int ignore_unknown(void|int value)
int lazy_argument_end(void|int value)
int lazy_entity_end(void|int value)
int match_tag(void|int value)
int max_parse_depth(void|int value)
int mixed_mode(void|int value)
int reparse_strings(void|int value)
int ws_before_tag_name(void|int value)
int xml_tag_syntax(void|int value)

Description

Functions to query or set flags. These set the associated flag to the value if any is given and returns the old value.

The flags are:

  • case_insensitive_tag: All tags and containers are matched case insensitively, and argument names are converted to lowercase. Tags added with add_quote_tag () are not affected, though. Switching to case insensitive mode and back won't preserve the case of registered tags and containers.

  • ignore_tags: Do not look for tags at all. Normally tags are matched even when there's no callbacks for them at all. When this is set, the tag delimiters '<' and '>' will be treated as any normal character.

  • ignore_unknown: Treat unknown tags and entities as text data, continuing parsing for tags and entities inside them.

  • lazy_argument_end: A '>' in a tag argument closes both the argument and the tag, even if the argument is quoted.

  • lazy_entity_end: Normally, the parser search indefinitely for the entity end character (i.e. ';'). When this flag is set, the characters '&', '<', '>', '"', ''', and any whitespace breaks the search for the entity end, and the entity text is then ignored, i.e. treated as data.

  • match_tag: Unquoted nested tag starters and enders will be balanced when parsing tags. This is the default.

  • max_stack_depth: Maximum recursion depth during parsing. Recursion occurs when a tag/container/entity/quote tag callback function returns a string to be reparsed. The default value is 10.

  • mixed_mode: Allow callbacks to return arbitrary data in the arrays, which will be concatenated in the output.

  • reparse_strings: When a plain string is used as a tag/container/entity/quote tag callback, it's not reparsed if this flag is unset. Setting it causes all such strings to be reparsed.

  • ws_before_tag_name: Allow whitespace between the tag start character and the tag name.

  • xml_tag_syntax: Whether or not to use XML syntax to tell empty tags and container tags apart:
    0: Use HTML syntax only. If there's a '/' last in a tag, it's just treated as any other argument.
    1: Use HTML syntax, but ignore a '/' if it comes last in a tag. This is the default.
    2: Use XML syntax, but when a tag that does not end with '/>' is found which only got a non-container tag callback, treat it as a non-container (i.e. don't start to seek for the container end).
    3: Use XML syntax only. If a tag got both container and non-container callbacks, the non-container callback is called when the empty element form (i.e. the one ending with '/>') is used, and the container callback otherwise. If only a container callback exists, it gets the empty string as content when there's none to be parsed. If only a non-container callback exists, it will be called (without the content argument) for both kinds of tags.

Note

When functions are specified with _set_tag_callback () or _set_entity_callback (), all tags or entities, respectively, are considered known. However, if one of those functions return 1 and ignore_unknown is set, they are treated as text data instead of making another call to the same function again.