ikiwiki/ patchqueue/ format escape

Since some preprocessor directives insert raw HTML, it would be good to specify, per-format, how to pass HTML so that it goes through the format OK. With Markdown we cross our fingers; with reST we use the "raw" directive.

I added an extra named parameter to the htmlize hook, which feels sort of wrong, since none of the other hooks take parameters. Let me know what you think. --Ethan

Seems fairly reasonable, actually. Shouldn't the $type come from $page instead of $destpage though? Only other obvious change is to make the escape parameter optional, and only call it if set. --Joey

I couldn't figure out what to make it from, but thinking it through, yeah, it should be $page. Revised patch follows. --Ethan

I've updated the patch some more, but I think it's incomplete. ikiwiki emits raw html when expanding WikiLinks too, and it would need to escape those. Assuming that escaping html embedded in the middle of a sentence works.. --Joey

Revised again. I get around this by making another hook, htmlescapelink, which is called to generate links in whatever language. In addition, it doesn't (can't?) generate spans, and it doesn't handle inlineable image links. If these were desired, the approach to take would probably be to use substitution definitions, which would require generating two bits of code for each link/html snippet, and putting one at the end of the paragraph (or maybe the document?). To specify that (for example) Discussion links are meant to be HTML and not rst or whatever, I added a "genhtml" parameter to htmllink. It seems to work -- see http://ikidev.betacantrips.com/blah.html for an example. --Ethan

Index: debian/changelog
===================================================================
--- debian/changelog    (revision 3197)
+++ debian/changelog    (working copy)
@@ -24,6 +24,9 @@
     than just a suggests, since OpenID is enabled by default.
   * Fix a bug that caused link(foo) to succeed if page foo did not exist.
   * Fix tags to page names that contain special characters.
+  * Based on a patch by Ethan, add a new htmlescape hook, that is called
+    when a preprocssor directive emits inline html. The rst plugin uses this
+    hook to support inlined raw html.

   [ Josh Triplett ]
   * Use pngcrush and optipng on all PNG files.
Index: IkiWiki/Render.pm
===================================================================
--- IkiWiki/Render.pm   (revision 3197)
+++ IkiWiki/Render.pm   (working copy)
@@ -96,7 +96,7 @@
        if ($page !~ /.*\/\Q$discussionlink\E$/ &&
           (length $config{cgiurl} ||
            exists $links{$page."/".$discussionlink})) {
-           $template->param(discussionlink => htmllink($page, $page, gettext("Discussion"), noimageinline => 1, forcesubpage => 1));
+           $template->param(discussionlink => htmllink($page, $page, gettext("Discussion"), noimageinline => 1, forcesubpage => 1, genhtml => 1));
            $actions++;
        }
    }
Index: IkiWiki/Plugin/rst.pm
===================================================================
--- IkiWiki/Plugin/rst.pm   (revision 3197)
+++ IkiWiki/Plugin/rst.pm   (working copy)
@@ -30,15 +30,36 @@
 html = publish_string(stdin.read(), writer_name='html', 
        settings_overrides = { 'halt_level': 6, 
                               'file_insertion_enabled': 0,
-                              'raw_enabled': 0 }
+                              'raw_enabled': 1 }
 );
 print html[html.find('')+6:html.find('')].strip();
 ";

 sub import { #{{{
    hook(type => "htmlize", id => "rst", call => \&htmlize);
+   hook(type => "htmlescape", id => "rst", call => \&htmlescape);
+   hook(type => "htmlescapelink", id => "rst", call => \&htmlescapelink);
 } # }}}

+sub htmlescapelink ($$;@) { #{{{
+   my $url = shift;
+   my $text = shift;
+   my %params = @_;
+
+   if ($params{broken}){
+       return "`? <$url>`_\ $text";
+   }
+   else {
+       return "`$text <$url>`_";
+   }
+} # }}}
+
+sub htmlescape ($) { #{{{
+   my $html=shift;
+   $html=~s/^/  /mg;
+   return ".. raw:: html\n\n".$html;
+} # }}}
+
 sub htmlize (@) { #{{{
    my %params=@_;
    my $content=$params{content};
Index: doc/plugins/write.mdwn
===================================================================
--- doc/plugins/write.mdwn  (revision 3197)
+++ doc/plugins/write.mdwn  (working copy)
@@ -121,6 +121,26 @@
 The function is passed named parameters: "page" and "content" and should
 return the htmlized content.

+### htmlescape
+
+   hook(type => "htmlescape", id => "ext", call => \&htmlescape);
+
+Some markup languages do not allow raw html to be mixed in with the markup
+language, and need it to be escaped in some way. This hook is a companion
+to the htmlize hook, and is called when ikiwiki detects that a preprocessor
+directive is inserting raw html. It is passed the chunk of html in
+question, and should return the escaped chunk.
+
+### htmlescapelink
+
+   hook(type => "htmlescapelink", id => "ext", call => \&htmlescapelink);
+
+Some markup languages have special syntax to link to other pages. This hook
+is a companion to the htmlize and htmlescape hooks, and it is called when a
+link is inserted. It is passed the target of the link and the text of the 
+link, and an optional named parameter "broken" if a broken link is being
+generated. It should return the correctly-formatted link.
+
 ### pagetemplate

    hook(type => "pagetemplate", id => "foo", call => \&pagetemplate);
@@ -355,6 +375,7 @@
 * forcesubpage  - set to force a link to a subpage
 * linktext - set to force the link text to something
 * anchor - set to make the link include an anchor
+* genhtml - set to generate HTML and not escape for correct format

 #### `readfile($;$)`

Index: doc/plugins/rst.mdwn
===================================================================
--- doc/plugins/rst.mdwn    (revision 3197)
+++ doc/plugins/rst.mdwn    (working copy)
@@ -10,10 +10,8 @@
 Note that this plugin does not interoperate very well with the rest of
 ikiwiki. Limitations include:

-* reStructuredText does not allow raw html to be inserted into
-  documents, but ikiwiki does so in many cases, including
-  WikiLinks and many
-  PreprocessorDirectives.
+* Some bits of ikiwiki may still assume that markdown is used or embed html
+  in ways that break reStructuredText. (Report bugs if you find any.)
 * It's slow; it forks a copy of python for each page. While there is a
   perl version of the reStructuredText processor, it is not being kept in
   sync with the standard version, so is not used.
Index: IkiWiki.pm
===================================================================
--- IkiWiki.pm  (revision 3197)
+++ IkiWiki.pm  (working copy)
@@ -469,6 +469,10 @@
    my $page=shift; # the page that will contain the link (different for inline)
    my $link=shift;
    my %opts=@_;
+   # we are processing $lpage and so we need to format things in accordance
+   # with the formatting language of $lpage. inline generates HTML so links
+   # will be escaped seperately.
+   my $type=pagetype($pagesources{$lpage});

    my $bestlink;
    if (! $opts{forcesubpage}) {
@@ -494,12 +498,17 @@
    }
    if (! grep { $_ eq $bestlink } map { @{$_} } values %renderedfiles) {
        return $linktext unless length $config{cgiurl};
-       return " "create",
-               page => pagetitle(lc($link), 1),
-               from => $lpage
-           ).
+       my $url = cgiurl(
+                do => "create",
+                page => pagetitle(lc($link), 1),
+                from => $lpage
+               );
+
+       if ($hooks{htmlescapelink}{$type} && ! $opts{genhtml}){
+           return $hooks{htmlescapelink}{$type}{call}->($url, $linktext,
+                                  broken => 1);
+       }
+       return "?$linktext"
    }

@@ -514,6 +523,9 @@
        $bestlink.="#".$opts{anchor};
    }

+   if ($hooks{htmlescapelink}{$type} && !$opts{genhtml}) {
+     return $hooks{htmlescapelink}{$type}{call}->($bestlink, $linktext);
+   }
    return "$linktext";
 } #}}}

@@ -628,6 +640,14 @@
                preview => $preprocess_preview,
            );
            $preprocessing{$page}--;
+
+           # Handle escaping html if the htmlizer needs it.
+           if ($ret =~ /[<>]/ && $pagesources{$page}) {
+               my $type=pagetype($pagesources{$page});
+               if ($hooks{htmlescape}{$type}) {
+                   return $hooks{htmlescape}{$type}{call}->($ret);
+               }
+           }
            return $ret;
        }
        else {