Description
This module defines a handful of operations which can be applied to
LPeg patterns and grammars in
general.
Dependencies
Operations
Piping
Pattern matching dissociates the notion of
matching from the notion of
capturing: matching checks if a given string follows a certain pattern,
and capturing generates values according to the match made. This division
allows interesting possibilities:
- different problems can be solved by applying different captures to the same grammar;
- captures may be defined separately;
- captures may be done on top of other captures.
Accounting for the first and second bullets, the grammar given in
parser has no captures, enabling the user to reuse it to solve any
problems that require a Lua grammar. One good example is documentation
generation, described in a little more detail
below.
The third bullet depicts a more interesting idea: a capture might take the
result of another capture as input, doing a further transformation of the
original data. This capture chaining, with the latter ones using the former's
output as its input, is very similar to
Unix pipelines,
so this mechanism was named
piping.
Completing
With piping, several levels of captures can be chained together up to the
most appropriate for the task at hand. Yet some levels might require extra rules, and modifications to existing ones, to ensure proper matching.
To avoid manual copying, the new grammar should redefine only the necessary
rules, copying the rest from the older grammar. This action is dubbed
completing.
Applying
Once a new rule set is created and
completed, and
all captures are correctly
piped, all that's left is
to put them together, a process called
applying. The result is a grammar ready for
lpeg.P
consumption, whose pattern will return the intended result when a match is made.
Example
Let's consider the problem of documenting a Lua module. In this case, comments
must be captured before every function declaration when in the outermost scope:
-- the code to parse
subject = [[
-- Calculates the sum a+b.
-- An extra line.
function sum (a, b)
-- code
end
-- f1: assume a variable assignment is not a proper declaration for an
-- exported function
f1 = function ()
-- code
end
while true do
-- this function is not in the outermost scope
function aux() end
end
function something:other(a, ...)
-- a global function without comments
end
]]
In the code above only
sum
and
something:other
should be documented, as
f1
isn't properly (by our standards) declared and
aux
is not in the outermost scope.
By combining
LPeg and the modules
scanner,
parser and
grammar, this specific problem can be solved as follows:
-- change only the initial rule and make no captures
patt = grammar.apply(parser.rules, scanner.COMMENT^-1 * lpeg.V'GlobalFunction', nil)
-- transform the new grammar into a LPeg pattern
patt = lpeg.P(patt)
-- making a pattern that matches any Lua statement, also without captures
Stat = lpeg.P( grammar.apply(parser.rules, lpeg.V'Stat', nil) )
-- a pattern which matches function declarations and skips statements in
-- inner scopes or undesired tokens
patt = (patt + Stat + scanner.ANY)^0
-- matching a string
patt:match(subject)
These are the relevant rules in
the grammar:
GlobalFunction = 'function' * FuncName * FuncBody
FuncName = ID * ('.' * ID)^0 * (':' * ID)^-1
FuncBody = '(' * (ParList + EPSILON) * ')' * Block * 'end'
ParList = NameList * (',' * '...')^-1
NameList = ID * (',' * ID)^0
ID = scanner.ID
EPSILON = lpeg.P(true)
It may seem that
ParList + EPSILON
could be substituted for
ParList^-1
(optionally match
ParList
), but then no captures would be made for empty parameter lists, and
GlobalFunction
would get all strings matched by
FuncBody
. The
EPSILON
rule acts in this manner as a placeholder in the argument list, avoiding any argument list processing in the capture function.
Since no captures are being made,
lpeg.match doesn't return anything interesting. Here are some possible captures:
-- some interesting captures bundled up in a table. Note that the table keys
-- match the grammar rules we want to add captures to. Whatever rules aren't in
-- the rules table below will come from parser.rules .
captures = {
[1] = function (...) -- the initial rule
return '<function>'..table.concat{...}..'</function>'
end,
GlobalFunction = function (name, parlist)
return '<name>'..name..'</name><parlist>'..(parlist or '')..'</parlist>'
end,
FuncName = grammar.C, -- capture the raw text
ParList = grammar.C, -- capture the raw text
COMMENT = scanner.comment2text, -- remove the comment trappings
}
-- spacing rule
local S = scanner.SPACE ^ 0
-- rules table
rules = {
[1] = ((lpeg.V'COMMENT' *S) ^ 0) *S* lpeg.V'GlobalFunction',
COMMENT = scanner.COMMENT,
}
-- building the new grammar and adding the captures
patt = lpeg.P( grammar.apply(parser.rules, rules, captures) )
-- a pattern that matches a sequence of patts and concatenates the results
patt = (patt + Stat + scanner.ANY)^0 / function(...)
return table.concat({...}, '\n\n') -- some line breaks for easier reading
end
-- finally, matching a string
print(patt:match(subject))
FuncBody
needs no captures, as
Block
and all its non-terminals have none; it
just needs to pass along any captures made by
ParList
.
NameList
and
ID
also have no captures, and the whole subject string is passed further.
The printed result is:
<function>Calculates the sum a+b. An extra line.<name>sum</name><parlist>a, b</parlist></function>
<function><name>something:other</name><parlist>a, ...</parlist></function>
Functions
anyOf (list) |
Returns a pattern which matches any of the patterns received. |
apply (grammar, rules, captures) |
Completes rules with grammar and then applies captures . |
C () |
A capture function, made so that patt / C is equivalent to m.C(patt) . It's intended to be used in capture tables, such as those required by pipe and apply. |
complete (dest, orig) |
Completes dest with orig . |
copy (grammar) |
Creates a shallow copy of grammar . |
Ct () |
A capture function, made so that patt / Ct is equivalent to m.Ct(patt) . It's intended to be used in capture tables, such as those required by pipe and apply. |
listOf (patt, sep) |
Returns a pattern which matches a list of patt s, separated by sep . |
pipe (dest, orig) |
Pipes the captures in orig to the ones in dest . |
anyOf (list)
Returns a pattern which matches any of the patterns received.
Example:
local g, s, m = require 'leg.grammar', require 'leg.scanner', require 'lpeg'
-- match numbers or operators, capture the numbers
print( (g.anyOf { '+', '-', '*', '/', m.C(s.NUMBER) }):match '34.5@23 * 56 / 45 - 45' )
--> prints 34.5
Parameters:
list
: a list of zero or more LPeg patterns or values which can be fed to lpeg.P.
Returns:
- a pattern which matches any of the patterns received.
apply (grammar, rules, captures)
Completes rules
with grammar
and then applies captures
.
rules
can either be:
- a single pattern, which is taken to be the new initial rule,
- a possibly incomplete LPeg grammar, as per complete, or
nil
, which means no new rules are added.
captures
can either be:
- a capture table, as per pipe, or
nil
, which means no captures are applied.
Parameters:
grammar
: the old grammar. It stays unmodified.
rules
: optional, the new rules.
captures
: optional, the final capture table.
Returns:
rules
, suitably augmented by grammar
and captures
.
C ()
A capture function, made so that patt / C
is equivalent to m.C(patt)
. It's intended to be used in capture tables, such as those required by pipe and apply.
complete (dest, orig)
Completes dest
with orig
.
Parameters:
dest
: the new grammar. Must be a table.
orig
: the original grammar. Must be a table.
Returns:
dest
, with new rules inherited from orig
.
copy (grammar)
Creates a shallow copy of grammar
.
Parameters:
grammar
: a regular table.
Returns:
- a newly created table, with
grammar
's keys and values.
Ct ()
A capture function, made so that patt / Ct
is equivalent to m.Ct(patt)
. It's intended to be used in capture tables, such as those required by pipe and apply.
listOf (patt, sep)
Returns a pattern which matches a list of patt
s, separated by sep
.
Example: matching comma-separated values:
local g, m = require 'leg.grammar', require 'lpeg'
-- separator
local sep = m.P',' + m.P'\n'
-- element: anything but sep, capture it
local elem = m.C((1 - sep)^0)
-- pattern
local patt = g.listOf(elem, sep)
-- matching
print( patt:match [[a, b, 'christmas eve'
d, evening; mate!
f]])
--> prints out "a b 'christmas eve' d evening; mate! f"
Parameters:
patt
: a LPeg pattern.
sep
: a LPeg pattern.
Returns:
- the following pattern:
patt * (sep * patt)^0
pipe (dest, orig)
Pipes the captures in orig
to the ones in dest
.
dest
and orig
should be tables, with each key storing a capture function. Each capture in dest
will be altered to use the results for the matching one in orig
as input, using function composition. Should orig
possess keys not in dest
, dest
will copy them.
Parameters:
dest
: a capture table.
orig
: a capture table.
Returns: