master 994bcc125b6: Fix the new PEG library

emacs-diffs
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
master 994bcc125b6: Fix the new PEG library

From:	Eli Zaretskii
Subject:	master 994bcc125b6: Fix the new PEG library
Date:	Sun, 31 Mar 2024 03:32:15 -0400 (EDT)
branch: master
commit 994bcc125b66397b455c8a7b70fb454b483df052
Author: Eli Zaretskii <eliz@gnu.org>
Commit: Eli Zaretskii <eliz@gnu.org>

    Fix the new PEG library
    
    * doc/lispref/peg.texi (Parsing Expression Grammars)
    (PEX Definitions, Parsing Actions, Writing PEG Rules): Fix markup,
    indexing, and wording.
    
    * etc/NEWS: Fix wording of PEG entry.
    
    * test/lisp/progmodes/peg-tests.el: Move from test/lisp/, to match
    the directory of peg.el.
---
 doc/lispref/peg.texi                   | 202 +++++++++++++++++++--------------
 etc/NEWS                               |   4 +-
 test/lisp/{ => progmodes}/peg-tests.el |   0
 3 files changed, 120 insertions(+), 86 deletions(-)

diff --git a/doc/lispref/peg.texi b/doc/lispref/peg.texi
index ef4dfa7653e..fbf57852ee0 100644
--- a/doc/lispref/peg.texi
+++ b/doc/lispref/peg.texi
@@ -7,29 +7,34 @@
 @chapter Parsing Expression Grammars
 @cindex text parsing
 @cindex parsing expression grammar
+@cindex PEG
 
   Emacs Lisp provides several tools for parsing and matching text,
 from regular expressions (@pxref{Regular Expressions}) to full
-@acronym{LL} grammar parsers (@pxref{Top,, Bovine parser
-development,bovine}).  @dfn{Parsing Expression Grammars}
+left-to-right (a.k.a.@: @acronym{LL}) grammar parsers (@pxref{Top,,
+Bovine parser development,bovine}).  @dfn{Parsing Expression Grammars}
 (@acronym{PEG}) are another approach to text parsing that offer more
 structure and composibility than regular expressions, but less
 complexity than context-free grammars.
 
-A @acronym{PEG} parser is defined as a list of named rules, each of
-which matches text patterns, and/or contains references to other
+A Parsing Expression Grammar (@acronym{PEG}) describes a formal language
+in terms of a set of rules for recognizing strings in the language.  In
+Emacs, a @acronym{PEG} parser is defined as a list of named rules, each
+of which matches text patterns and/or contains references to other
 rules.  Parsing is initiated with the function @code{peg-run} or the
 macro @code{peg-parse} (see below), and parses text after point in the
 current buffer, using a given set of rules.
 
 @cindex parsing expression
-The definition of each rule is referred to as a @dfn{parsing
-expression} (@acronym{PEX}), and can consist of a literal string, a
-regexp-like character range or set, a peg-specific construct
-resembling an elisp function call, a reference to another rule, or a
-combination of any of these.  A grammar is expressed as a tree of
-rules in which one rule is typically treated as a ``root'' or
-``entry-point'' rule.  For instance:
+@cindex root, of parsing expression grammar
+@cindex entry-point, of parsing expression grammar
+Each rule in a @acronym{PEG} is referred to as a @dfn{parsing
+expression} (@acronym{PEX}), and can be specified a a literal string, a
+regexp-like character range or set, a peg-specific construct resembling
+an Emacs Lisp function call, a reference to another rule, or a
+combination of any of these.  A grammar is expressed as a tree of rules
+in which one rule is typically treated as a ``root'' or ``entry-point''
+rule.  For instance:
 
 @example
 @group
@@ -56,14 +61,17 @@ first rule is considered the ``entry-point'':
 @end group
 @end example
 
-This macro represents the simplest use of the @acronym{PEG} library,
-but also the least flexible, as the rules must be written directly
-into the source code.  A more flexible approach involves use of three
-macros in conjunction: @code{with-peg-rules}, a @code{let}-like
-construct that makes a set of rules available within the macro body;
-@code{peg-run}, which initiates parsing given a single rule; and
-@code{peg}, which is used to wrap the entry-point rule name.  In fact,
-a call to @code{peg-parse} expands to just this set of calls.  The
+@c FIXME: These two should be formally defined using @defmac and @defun.
+@findex with-peg-rules
+@findex peg-run
+The @code{peg-parse} macro represents the simplest use of the
+@acronym{PEG} library, but also the least flexible, as the rules must be
+written directly into the source code.  A more flexible approach
+involves use of three macros in conjunction: @code{with-peg-rules}, a
+@code{let}-like construct that makes a set of rules available within the
+macro body; @code{peg-run}, which initiates parsing given a single rule;
+and @code{peg}, which is used to wrap the entry-point rule name.  In
+fact, a call to @code{peg-parse} expands to just this set of calls.  The
 above example could be written as:
 
 @example
@@ -79,33 +87,43 @@ above example could be written as:
 This allows more explicit control over the ``entry-point'' of parsing,
 and allows the combination of rules from different sources.
 
+@c FIXME: Use @defmac.
+@findex define-peg-rule
 Individual rules can also be defined using a more @code{defun}-like
 syntax, using the macro @code{define-peg-rule}:
 
 @example
+@group
 (define-peg-rule digit ()
   [0-9])
+@end group
 @end example
 
 This also allows for rules that accept an argument (supplied by the
-@code{funcall} PEG rule).
+@code{funcall} PEG rule, @pxref{PEX Definitions}).
 
+@c FIXME: Use @defmac.
+@findex define-peg-ruleset
 Another possibility is to define a named set of rules with
 @code{define-peg-ruleset}:
 
 @example
+@group
 (define-peg-ruleset number-grammar
         '((number sign digit (* digit))
           digit  ;; A reference to the definition above.
           (sign (or "+" "-" ""))))
+@end group
 @end example
 
 Rules and rulesets defined this way can be referred to by name in
 later calls to @code{peg-run} or @code{with-peg-rules}:
 
 @example
+@group
 (with-peg-rules number-grammar
   (peg-run (peg number)))
+@end group
 @end example
 
 By default, calls to @code{peg-run} or @code{peg-parse} produce no
@@ -125,11 +143,11 @@ act upon parsed strings, rules can include @dfn{actions}, 
see
 Parsing expressions can be defined using the following syntax:
 
 @table @code
-@item (and E1 E2 ...)
-A sequence of @acronym{PEX}s that must all be matched.  The @code{and} form is
-optional and implicit.
+@item (and @var{e1} @var{e2}@dots{})
+A sequence of @acronym{PEX}s that must all be matched.  The @code{and}
+form is optional and implicit.
 
-@item (or E1 E2 ...)
+@item (or @var{e1} @var{e2}@dots{})
 Prioritized choices, meaning that, as in Elisp, the choices are tried
 in order, and the first successful match is used.  Note that this is
 distinct from context-free grammars, in which selection between
@@ -141,43 +159,43 @@ Matches any single character, as the regexp ``.''.
 @item @var{string}
 A literal string.
 
-@item (char @var{C})
-A single character @var{C}, as an Elisp character literal.
+@item (char @var{c})
+A single character @var{c}, as an Elisp character literal.
 
-@item (* @var{E})
-Zero or more instances of expression @var{E}, as the regexp @samp{*}.
+@item (* @var{e})
+Zero or more instances of expression @var{e}, as the regexp @samp{*}.
 Matching is always ``greedy''.
 
-@item (+ @var{E})
-One or more instances of expression @var{E}, as the regexp @samp{+}.
+@item (+ @var{e})
+One or more instances of expression @var{e}, as the regexp @samp{+}.
 Matching is always ``greedy''.
 
-@item (opt @var{E})
-Zero or one instance of expression @var{E}, as the regexp @samp{?}.
+@item (opt @var{e})
+Zero or one instance of expression @var{e}, as the regexp @samp{?}.
 
-@item SYMBOL
+@item @var{symbol}
 A symbol representing a previously-defined PEG rule.
 
-@item (range CH1 CH2)
-The character range between CH1 and CH2, as the regexp @samp{[CH1-CH2]}.
+@item (range @var{ch1} @var{ch2})
+The character range between @var{ch1} and @var{ch2}, as the regexp
+@samp{[@var{ch1}-@var{ch2}]}.
 
-@item [CH1-CH2 "+*" ?x]
+@item [@var{ch1}-@var{ch2} "+*" ?x]
 A character set, which can include ranges, character literals, or
 strings of characters.
 
 @item [ascii cntrl]
 A list of named character classes.
 
-@item (syntax-class @var{NAME})
+@item (syntax-class @var{name})
 A single syntax class.
 
-@item (funcall E ARGS...)
-Call @acronym{PEX} E (previously defined with @code{define-peg-rule})
-with arguments @var{ARGS}.
+@item (funcall @var{e} @var{args}@dots{})
+Call @acronym{PEX} @var{e} (previously defined with
+@code{define-peg-rule}) with arguments @var{args}.
 
 @item (null)
 The empty string.
-
 @end table
 
 The following expressions are used as anchors or tests -- they do not
@@ -210,19 +228,19 @@ Beginning of symbol.
 @item (eos)
 End of symbol.
 
-@item (if E)
-Returns non-@code{nil} if parsing @acronym{PEX} E from point succeeds (point
-is not moved).
-
-@item (not E)
-Returns non-@code{nil} if parsing @acronym{PEX} E from point fails (point
-is not moved).
+@item (if @var{e})
+Returns non-@code{nil} if parsing @acronym{PEX} @var{e} from point
+succeeds (point is not moved).
 
-@item (guard EXP)
-Treats the value of the Lisp expression EXP as a boolean.
+@item (not @var{e})
+Returns non-@code{nil} if parsing @acronym{PEX} @var{e} from point fails
+(point is not moved).
 
+@item (guard @var{exp})
+Treats the value of the Lisp expression @var{exp} as a boolean.
 @end table
 
+@c FIXME: peg-char-classes should be mentioned in the text below.
 @vindex peg-char-classes
 Character class matching can use the same named character classes as
 in regular expressions (@pxref{Top,, Character Classes,elisp})
@@ -234,12 +252,13 @@ in regular expressions (@pxref{Top,, Character 
Classes,elisp})
 @cindex parsing stack
 By default the process of parsing simply moves point in the current
 buffer, ultimately returning @code{t} if the parsing succeeds, and
-@code{nil} if it doesn't.  It's also possible to define ``actions''
-that can run arbitrary Elisp at certain points in the parsed text.
-These actions can optionally affect something called the @dfn{parsing
-stack}, which is a list of values returned by the parsing process.
-These actions only run (and only return values) if the parsing process
-ultimately succeeds; if it fails the action code is not run at all.
+@code{nil} if it doesn't.  It's also possible to define @dfn{parsing
+actions} that can run arbitrary Elisp at certain points in the parsed
+text.  These actions can optionally affect something called the
+@dfn{parsing stack}, which is a list of values returned by the parsing
+process.  These actions only run (and only return values) if the parsing
+process ultimately succeeds; if it fails the action code is not run at
+all.
 
 Actions can be added anywhere in the definition of a rule.  They are
 distinguished from parsing expressions by an initial backquote
@@ -247,12 +266,13 @@ distinguished from parsing expressions by an initial 
backquote
 of hyphens (@samp{--}) somewhere within it.  Symbols to the left of
 the hyphens are bound to values popped from the stack (they are
 somewhat analogous to the argument list of a lambda form).  Values
-produced by code to the right are pushed to the stack (analogous to
-the return value of the lambda).  For instance, the previous grammar
-can be augmented with actions to return the parsed number as an actual
-integer:
+produced by code to the right of the hyphens are pushed onto the stack
+(analogous to the return value of the lambda).  For instance, the
+previous grammar can be augmented with actions to return the parsed
+number as an actual integer:
 
 @example
+@group
 (with-peg-rules ((number sign digit (* digit
                                        `(a b -- (+ (* a 10) b)))
                          `(sign val -- (* sign val)))
@@ -261,6 +281,7 @@ integer:
                            (and ""  `(-- 1))))
                  (digit [0-9] `(-- (- (char-before) ?0))))
   (peg-run (peg number)))
+@end group
 @end example
 
 There must be values on the stack before they can be popped and
@@ -271,43 +292,53 @@ only left-hand terms will consume (and discard) values 
from the stack.
 At the end of parsing, stack values are returned as a flat list.
 
 To return the string matched by a @acronym{PEX} (instead of simply
-moving point over it), a rule like this can be used:
+moving point over it), a grammar can use a rule like this:
 
 @example
+@group
 (one-word
   `(-- (point))
   (+ [word])
   `(start -- (buffer-substring start (point))))
+@end group
 @end example
 
-The first action pushes the initial value of point to the stack.  The
-intervening @acronym{PEX} moves point over the next word.  The second
-action pops the previous value from the stack (binding it to the
-variable @code{start}), and uses that value to extract a substring
-from the buffer and push it to the stack.  This pattern is so common
-that @acronym{PEG} provides a shorthand function that does exactly the
-above, along with a few other shorthands for common scenarios:
+@noindent
+The first action above pushes the initial value of point to the stack.
+The intervening @acronym{PEX} moves point over the next word.  The
+second action pops the previous value from the stack (binding it to the
+variable @code{start}), then uses that value to extract a substring from
+the buffer and push it to the stack.  This pattern is so common that
+@acronym{PEG} provides a shorthand function that does exactly the above,
+along with a few other shorthands for common scenarios:
 
 @table @code
-@item (substring @var{E})
-Match @acronym{PEX} @var{E} and push the matched string to the stack.
-
-@item (region @var{E})
-Match @var{E} and push the start and end positions of the matched
-region to the stack.
-
-@item (replace @var{E} @var{replacement})
-Match @var{E} and replaced the matched region with the string 
@var{replacement}.
-
-@item (list @var{E})
-Match @var{E}, collect all values produced by @var{E} (and its
-sub-expressions) into a list, and push that list to the stack.  Stack
+@findex substring (a PEG shorthand)
+@item (substring @var{e})
+Match @acronym{PEX} @var{e} and push the matched string onto the stack.
+
+@findex region (a PEG shorthand)
+@item (region @var{e})
+Match @var{e} and push the start and end positions of the matched
+region onto the stack.
+
+@findex replace (a PEG shorthand)
+@item (replace @var{e} @var{replacement})
+Match @var{e} and replaced the matched region with the string
+@var{replacement}.
+
+@findex list (a PEG shorthand)
+@item (list @var{e})
+Match @var{e}, collect all values produced by @var{e} (and its
+sub-expressions) into a list, and push that list onto the stack.  Stack
 values are typically returned as a flat list; this is a way of
 ``grouping'' values together.
 @end table
 
 @node Writing PEG Rules
 @section Writing PEG Rules
+@cindex PEG rules, pitfalls
+@cindex Parsing Expression Grammar, pitfalls in rules
 
 Something to be aware of when writing PEG rules is that they are
 greedy.  Rules which can consume a variable amount of text will always
@@ -319,9 +350,10 @@ backtracking.  For instance, this rule will never succeed:
 (forest (+ "tree" (* [blank])) "tree" (eol))
 @end example
 
-The @acronym{PEX} @code{(+ "tree" (* [blank]))} will consume all
-repetitions of the word ``tree'', leaving none to match the final
-@code{"tree"}.
+@noindent
+The @acronym{PEX} @w{@code{(+ "tree" (* [blank]))}} will consume all
+the repetitions of the word @samp{tree}, leaving none to match the final
+@samp{tree}.
 
 In these situations, the desired result can be obtained by using
 predicates and guards -- namely the @code{not}, @code{if} and
@@ -331,6 +363,7 @@ predicates and guards -- namely the @code{not}, @code{if} 
and
 (forest (+ "tree" (* [blank])) (not (eol)) "tree" (eol))
 @end example
 
+@noindent
 The @code{if} and @code{not} operators accept a parsing expression and
 interpret it as a boolean, without moving point.  The contents of a
 @code{guard} operator are evaluated as regular Lisp (not a
@@ -345,6 +378,7 @@ rule:
 (end-game "game" (eob))
 @end example
 
+@noindent
 when run in a buffer containing the text ``game over'' after point,
 will move point to just after ``game'' then halt parsing, returning
 @code{nil}.  Successful parsing will always return @code{t}, or the
diff --git a/etc/NEWS b/etc/NEWS
index 8e1c1082b3a..1204f58c5ca 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1587,8 +1587,8 @@ preventing the installation of Compat if unnecessary.
 
 +++
 ** New package PEG.
-Emacs now includes a library for writing (P)arsing (E)xpression
-(G)rammars, an approach to text parsing that provides more structure
+Emacs now includes a library for writing Parsing Expression
+Grammars (PEG), an approach to text parsing that provides more structure
 than regular expressions, but less complexity than context-free
 grammars.  The Info manual "(elisp) Parsing Expression Grammars" has
 documentation and examples.
diff --git a/test/lisp/peg-tests.el b/test/lisp/progmodes/peg-tests.el
similarity index 100%
rename from test/lisp/peg-tests.el
rename to test/lisp/progmodes/peg-tests.el
[Prev in Thread]
Current Thread
[Next in Thread]
master 994bcc125b6: Fix the new PEG library, Eli Zaretskii <=
Prev by Date: master 914b00f2079: ; Another round of stylistic fixes in json.c
Next by Date: master de8cae30bcf: Add global minor mode 'global-completion-preview-mode'
Previous by thread: master 914b00f2079: ; Another round of stylistic fixes in json.c
Next by thread: master de8cae30bcf: Add global minor mode 'global-completion-preview-mode'
Index(es):
- Date
- Thread