guile-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Guile-commits] GNU Guile branch, wip-peg, updated. v2.0.0-181-g82cf81b


From: Noah Lavine
Subject: [Guile-commits] GNU Guile branch, wip-peg, updated. v2.0.0-181-g82cf81b
Date: Fri, 15 Apr 2011 22:02:44 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".

http://git.savannah.gnu.org/cgit/guile.git/commit/?id=82cf81b3a160c361d561d4bf97b9ccfc1281022f

The branch, wip-peg has been updated
       via  82cf81b3a160c361d561d4bf97b9ccfc1281022f (commit)
       via  b61e2aa9653dcf452e85bb2432cc59f86ba4a9b3 (commit)
       via  fe5aedd229f78d2f72785d6174758a7d70d1c7a1 (commit)
       via  6718bedb6ff0db10d6c3d055bebedacb0b601678 (commit)
       via  6f7728142bf934f0fb64d01fa3aeaf7c291be3b2 (commit)
       via  e058fdce5b8e755b9a6b9627f0859812256b00bf (commit)
       via  92b693cef49d11c94dfedbf5ba57d451df7580ee (commit)
      from  bccf0f293052c01800758329fcd6a868e3762a0e (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 82cf81b3a160c361d561d4bf97b9ccfc1281022f
Author: Noah Lavine <address@hidden>
Date:   Fri Apr 15 17:04:02 2011 -0400

    Make PEG Files
    
    * module/ice-9/peg/using-parsers.scm: remove unnecessary dependency
    * module/ice-9/peg.scm: add comment about string-peg dependency
    * module/Makefile.scm: add PEG files to makefile

commit b61e2aa9653dcf452e85bb2432cc59f86ba4a9b3
Author: Noah Lavine <address@hidden>
Date:   Fri Apr 15 16:57:11 2011 -0400

    Rearrange PEG Modules
    
    * module/ice-9/peg.scm: move code out of here
    * module/ice-9/peg/match-records.scm: remove this file
    * module/ice-9/peg/using-parsers.scm: make a new module with utilities for
      using parsers. It contains the code from both peg.scm and 
match-records.scm
    * module/ice-9/peg/string-peg.scm: update to use new module

commit fe5aedd229f78d2f72785d6174758a7d70d1c7a1
Author: Noah Lavine <address@hidden>
Date:   Fri Apr 1 19:45:54 2011 -0400

    PEG Cache Module
    
    * module/ice-9/peg/cache.scm: add module to hold cache logic for PEG
       parsers
    * module/ice-9/peg.scm: move cache logic out of here

commit 6718bedb6ff0db10d6c3d055bebedacb0b601678
Author: Noah Lavine <address@hidden>
Date:   Thu Mar 31 17:42:36 2011 -0400

    Update String PEGs
    
    * module/ice-9/peg/string-peg.scm: use new interface for extending PEG
       syntax
    * module/ice-9/peg.scm: remove peg-extended-compile

commit 6f7728142bf934f0fb64d01fa3aeaf7c291be3b2
Author: Noah Lavine <address@hidden>
Date:   Thu Mar 31 17:04:06 2011 -0400

    Extensible PEG Syntax
    
    * module/ice-9/peg/codegen.scm: Make the PEG syntax extensible, and
        move most of the current code generators to the new interface
    * doc/ref/api-peg.texi: Document PEG extensions in the PEG Internals
        section of the manual

commit e058fdce5b8e755b9a6b9627f0859812256b00bf
Author: Noah Lavine <address@hidden>
Date:   Mon Mar 28 15:18:27 2011 -0400

    Separate PEG Concerns
    
    * module/ice-9/peg/codegen.scm: peg-sexp-compile no longer knows about
       string PEGs
    * module/ice-9/peg.scm: add a new function peg-extended-compile that
       calls peg-sexp-compile or peg-string-compile on its argument as
       appropriate

commit 92b693cef49d11c94dfedbf5ba57d451df7580ee
Author: Noah Lavine <address@hidden>
Date:   Mon Mar 28 15:13:35 2011 -0400

    Move define-nonterm
    
    * module/ice-9/peg/string-peg.scm: remove define-nonterm and make a simpler
       macro called `define-sexp-parser' to make the PEG grammar
    * module/ice-9/peg.scm: move define-nonterm macro to this file
    * module/ice-9/peg/codegen.scm: move code to wrap a parser result nicely to
       this file, under name `wrap-parser-for-users'

-----------------------------------------------------------------------

Summary of changes:
 doc/ref/api-peg.texi                            |   32 ++++
 module/Makefile.am                              |    5 +
 module/ice-9/peg.scm                            |   71 +-------
 module/ice-9/peg/cache.scm                      |   45 +++++
 module/ice-9/peg/codegen.scm                    |  206 ++++++++++++++---------
 module/ice-9/peg/match-record.scm               |   43 -----
 module/ice-9/peg/string-peg.scm                 |  119 ++++---------
 module/ice-9/{peg.scm => peg/using-parsers.scm} |   70 +++++---
 8 files changed, 302 insertions(+), 289 deletions(-)
 create mode 100644 module/ice-9/peg/cache.scm
 delete mode 100644 module/ice-9/peg/match-record.scm
 copy module/ice-9/{peg.scm => peg/using-parsers.scm} (62%)

diff --git a/doc/ref/api-peg.texi b/doc/ref/api-peg.texi
index 0c83365..6d0a346 100644
--- a/doc/ref/api-peg.texi
+++ b/doc/ref/api-peg.texi
@@ -992,3 +992,35 @@ interface.
 
 The above function can be used to match a string by running
 @code{(peg-parse match-a-b "ab")}.
+
address@hidden Code Generators and Extensible Syntax
+
+PEG expressions, such as those in a @code{define-nonterm} form, are
+interpreted internally in two steps.
+
+First, any string PEG is expanded into an s-expression PEG by the code
+in the @code{(ice-9 peg string-peg)} module.
+
+Then, then s-expression PEG that results is compiled into a parsing
+function by the @code{(ice-9 peg codegen)} module. In particular, the
+function @code{peg-sexp-compile} is called on the s-expression. It then
+decides what to do based on the form it is passed.
+
+The PEG syntax can be expanded by providing @code{peg-sexp-compile} more
+options for what to do with its forms. The extended syntax will be
+associated with a symbol, for instance @code{my-parsing-form}, and will
+be called on all PEG expressions of the form
address@hidden
+(my-parsing-form ...)
address@hidden lisp
+
+The parsing function should take two arguments. The first will be a
+syntax object containing a list with all of the arguments to the form
+(but not the form's name), and the second will be the
address@hidden argument that is passed to @code{define-nonterm}.
+
+New functions can be registered by calling @code{(add-peg-compiler!
+symbol function)}, where @code{symbol} is the symbol that will indicate
+a form of this type and @code{function} is the code generating function
+described above. The function @code{add-peg-compiler!} is exported from
+the @code{(ice-9 peg codegen)} module.
diff --git a/module/Makefile.am b/module/Makefile.am
index 6cebff1..99814b4 100644
--- a/module/Makefile.am
+++ b/module/Makefile.am
@@ -202,6 +202,11 @@ ICE_9_SOURCES = \
   ice-9/null.scm \
   ice-9/occam-channel.scm \
   ice-9/optargs.scm \
+  ice-9/peg/simplify-tree.scm \
+  ice-9/peg/codegen.scm \
+  ice-9/peg/cache.scm \
+  ice-9/peg/using-parsers.scm \
+  ice-9/peg/string-peg.scm \
   ice-9/peg.scm \
   ice-9/poe.scm \
   ice-9/poll.scm \
diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 644af6d..aa2754b 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -18,19 +18,20 @@
 ;;;;
 
 (define-module (ice-9 peg)
-  #:export (peg-parse
-;            define-nonterm
-;            define-nonterm-f
-            peg-match)
-;  #:export-syntax (define-nonterm)
   #:use-module (ice-9 peg codegen)
   #:use-module (ice-9 peg string-peg)
+  ;; Note: the most important effect of using string-peg is not whatever
+  ;; functions it exports, but the fact that it adds a new handler to
+  ;; peg-sexp-compile.
   #:use-module (ice-9 peg simplify-tree)
-  #:use-module (ice-9 peg match-record)
-  #:re-export (peg-sexp-compile
+  #:use-module (ice-9 peg using-parsers)
+  #:use-module (ice-9 peg cache)
+  #:re-export (peg-parse
+               define-nonterm
+               peg-match
+               peg-sexp-compile
                define-grammar
                define-grammar-f
-               define-nonterm
                keyword-flatten
                context-flatten
                peg:start
@@ -40,57 +41,3 @@
                peg:substring
                peg-record?))
 
-;;;
-;;; Helper Macros
-;;;
-
-(define-syntax until
-  (syntax-rules ()
-    "Evaluate TEST.  If it is true, return its value.  Otherwise,
-execute the STMTs and try again."
-    ((_ test stmt stmt* ...)
-     (let lp ()
-       (or test
-           (begin stmt stmt* ... (lp)))))))
-
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-;;;;; FOR DEFINING AND USING NONTERMINALS
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-
-;; Parses STRING using NONTERM
-(define (peg-parse nonterm string)
-  ;; We copy the string before using it because it might have been modified
-  ;; in-place since the last time it was parsed, which would invalidate the
-  ;; cache.  Guile uses copy-on-write for strings, so this is fast.
-  (let ((res (nonterm (string-copy string) (string-length string) 0)))
-    (if (not res)
-        #f
-        (make-prec 0 (car res) string (string-collapse (cadr res))))))
-
-;; Searches through STRING for something that parses to PEG-MATCHER.  Think
-;; regexp search.
-(define-syntax peg-match
-  (lambda (x)
-    (syntax-case x ()
-      ((_ pattern string-uncopied)
-       (let ((pmsym (syntax->datum #'pattern)))
-         (let ((matcher (if (string? (syntax->datum #'pattern))
-                            (peg-string-compile #'pattern 'body)
-                            (peg-sexp-compile #'pattern 'body))))
-           ;; We copy the string before using it because it might have been
-           ;; modified in-place since the last time it was parsed, which would
-           ;; invalidate the cache.  Guile uses copy-on-write for strings, so
-           ;; this is fast.
-           #`(let ((string (string-copy string-uncopied))
-                   (strlen (string-length string-uncopied))
-                   (at 0))
-               (let ((ret (until (or (>= at strlen)
-                                     (#,matcher string strlen at))
-                                 (set! at (+ at 1)))))
-                 (if (eq? ret #t) ;; (>= at strlen) succeeded
-                     #f
-                     (let ((end (car ret))
-                           (match (cadr ret)))
-                       (make-prec
-                        at end string
-                        (string-collapse match))))))))))))
diff --git a/module/ice-9/peg/cache.scm b/module/ice-9/peg/cache.scm
new file mode 100644
index 0000000..f45432b
--- /dev/null
+++ b/module/ice-9/peg/cache.scm
@@ -0,0 +1,45 @@
+;;;; cache.scm --- cache the results of parsing
+;;;;
+;;;;   Copyright (C) 2010, 2011 Free Software Foundation, Inc.
+;;;;
+;;;; This library is free software; you can redistribute it and/or
+;;;; modify it under the terms of the GNU Lesser General Public
+;;;; License as published by the Free Software Foundation; either
+;;;; version 3 of the License, or (at your option) any later version.
+;;;; 
+;;;; This library is distributed in the hope that it will be useful,
+;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;;;; Lesser General Public License for more details.
+;;;; 
+;;;; You should have received a copy of the GNU Lesser General Public
+;;;; License along with this library; if not, write to the Free Software
+;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
USA
+;;;;
+
+(define-module (ice-9 peg cache)
+  #:export (cg-cached-parser))
+
+;; The results of parsing using a nonterminal are cached.  Think of it like a
+;; hash with no conflict resolution.  Process for deciding on the cache size
+;; wasn't very scientific; just ran the benchmarks and stopped a little after
+;; the point of diminishing returns on my box.
+(define *cache-size* 512)
+
+(define (make-cache)
+  (make-vector *cache-size* #f))
+
+;; given a syntax object which is a parser function, returns syntax
+;; which, if evaluated, will become a parser function that uses a cache.
+(define (cg-cached-parser parser)
+  #`(let ((cache (make-cache)))
+      (lambda (str strlen at)
+        (let* ((vref (vector-ref cache (modulo at *cache-size*))))
+          ;; Check to see whether the value is cached.
+          (if (and vref (eq? (car vref) str) (= (cadr vref) at))
+              (caddr vref);; If it is return it.
+              (let ((fres ;; Else calculate it and cache it.
+                     (#,parser str strlen at)))
+                (vector-set! cache (modulo at *cache-size*)
+                             (list str at fres))
+                fres))))))
diff --git a/module/ice-9/peg/codegen.scm b/module/ice-9/peg/codegen.scm
index 2c85ccc..597ead9 100644
--- a/module/ice-9/peg/codegen.scm
+++ b/module/ice-9/peg/codegen.scm
@@ -18,9 +18,7 @@
 ;;;;
 
 (define-module (ice-9 peg codegen)
-  #:export (peg-sexp-compile)
-  #:use-module (ice-9 peg)
-  #:use-module (ice-9 peg string-peg)
+  #:export (peg-sexp-compile wrap-parser-for-users add-peg-compiler!)
   #:use-module (ice-9 pretty-print)
   #:use-module (system base pmatch))
 
@@ -123,18 +121,35 @@ return EXP."
 
 ;; Generates code for matching a range of characters between start and end.
 ;; E.g.: (cg-range syntax #\a #\z 'body)
-(define (cg-range start end accum)
-  #`(lambda (str len pos)
-      (and (< pos len)
-           (let ((c (string-ref str pos)))
-             (and (char>=? c #,start)
-                  (char<=? c #,end)
-                  #,(case accum
-                      ((all) #`(list (1+ pos) (list 'cg-range (string c))))
-                      ((name) #`(list (1+ pos) 'cg-range))
-                      ((body) #`(list (1+ pos) (string c)))
-                      ((none) #`(list (1+ pos) '()))
-                      (else (error "bad accum" accum))))))))
+(define (cg-range pat accum)
+  (syntax-case pat ()
+    ((start end)
+     (if (not (and (char? (syntax->datum #'start))
+                   (char? (syntax->datum #'end))))
+         (error "range PEG should have characters after it; instead got"
+                #'start #'end))
+     #`(lambda (str len pos)
+         (and (< pos len)
+              (let ((c (string-ref str pos)))
+                (and (char>=? c start)
+                     (char<=? c end)
+                     #,(case accum
+                         ((all) #`(list (1+ pos) (list 'cg-range (string c))))
+                         ((name) #`(list (1+ pos) 'cg-range))
+                         ((body) #`(list (1+ pos) (string c)))
+                         ((none) #`(list (1+ pos) '()))
+                         (else (error "bad accum" accum))))))))))
+
+;; Generate code to match a pattern and do nothing with the result
+(define (cg-ignore pat accum)
+  (syntax-case pat ()
+    ((inner)
+     (peg-sexp-compile #'inner 'none))))
+
+(define (cg-capture pat accum)
+  (syntax-case pat ()
+    ((inner)
+     (peg-sexp-compile #'inner 'body))))
 
 ;; Filters the accum argument to peg-sexp-compile for buildings like string
 ;; literals (since we don't want to tag them with their name if we're doing an
@@ -147,38 +162,11 @@ return EXP."
    ((eq? accum 'none) 'none)))
 (define baf builtin-accum-filter)
 
-;; Takes an arbitrary expressions and accumulation variable, then parses it.
-;; E.g.: (peg-sexp-compile syntax '(and "abc" (or "-" (range #\a #\z))) 'all)
-(define (peg-sexp-compile pat accum)
-  (syntax-case pat (peg-any range ignore capture peg and or body)
-    (peg-any
-     (cg-peg-any (baf accum)))
-    (sym (identifier? #'sym) ;; nonterminal
-     #'sym)
-    (str (string? (syntax->datum #'str)) ;; literal string
-     (cg-string (syntax->datum #'str) (baf accum)))
-    ((range start end) ;; range of characters (e.g. [a-z])
-     (and (char? (syntax->datum #'start)) (char? (syntax->datum #'end)))
-     (cg-range (syntax->datum #'start) (syntax->datum #'end) (baf accum)))
-    ((ignore pat) ;; match but don't parse
-     (peg-sexp-compile #'pat 'none))
-    ((capture pat) ;; parse
-     (peg-sexp-compile #'pat 'body))
-    ((peg pat)  ;; embedded PEG string
-     (string? (syntax->datum #'pat))
-     (peg-string-compile #'pat (baf accum)))
-    ((and pat ...)
-     (cg-and #'(pat ...) (baf accum)))
-    ((or pat ...)
-     (cg-or #'(pat ...) (baf accum)))
-    ((body type pat num)
-     (cg-body (baf accum) #'type #'pat #'num))))
-
 ;; Top-level function builder for AND.  Reduces to a call to CG-AND-INT.
 (define (cg-and clauses accum)
   #`(lambda (str len pos)
       (let ((body '()))
-        #,(cg-and-int clauses accum #'str #'len #'pos #'body))))
+        #,(cg-and-int clauses (baf accum) #'str #'len #'pos #'body))))
 
 ;; Internal function builder for AND (calls itself).
 (define (cg-and-int clauses accum str strlen at body)
@@ -198,7 +186,7 @@ return EXP."
 ;; Top-level function builder for OR.  Reduces to a call to CG-OR-INT.
 (define (cg-or clauses accum)
   #`(lambda (str len pos)
-      #,(cg-or-int clauses accum #'str #'len #'pos)))
+      #,(cg-or-int clauses (baf accum) #'str #'len #'pos)))
 
 ;; Internal function builder for OR (calls itself).
 (define (cg-or-int clauses accum str strlen at)
@@ -210,37 +198,99 @@ return EXP."
            #,(cg-or-int #'(rest ...) accum str strlen at)))))
 
 ;; Returns a function that parses a BODY element.
-(define (cg-body accum type pat num)
-  #`(lambda (str strlen at)
-      (let ((body '()))
-        (let lp ((end at) (count 0))
-          (let* ((match (#,(peg-sexp-compile pat accum) str strlen end))
-                 (new-end (if match (car match) end))
-                 (count (if (> new-end end) (1+ count) count)))
-            (if (> new-end end)
-                (push-not-null! body (single-filter (cadr match))))
-            (if (and (> new-end end)
-                     #,(syntax-case num (+ * ?)
-                         (n (number? (syntax->datum #'n))
-                            #'(< count n))
-                         (+ #t)
-                         (* #t)
-                         (? #'(< count 1))))
-                (lp new-end count)
-                (let ((success #,(syntax-case num (+ * ?)
-                                   (n (number? (syntax->datum #'n))
-                                      #'(= count n))
-                                   (+ #'(>= count 1))
-                                   (* #t)
-                                   (? #t))))
-                  #,(syntax-case type (! & lit)
-                      (!
-                       #`(if success
-                             #f
-                             #,(cggr accum 'cg-body #''() #'at)))
-                      (&
-                       #`(and success
-                              #,(cggr accum 'cg-body #''() #'at)))
-                      (lit
-                       #`(and success
-                              #,(cggr accum 'cg-body #'(reverse body) 
#'new-end)))))))))))
+(define (cg-body args accum)
+  (syntax-case args ()
+    ((type pat num)
+     #`(lambda (str strlen at)
+         (let ((body '()))
+           (let lp ((end at) (count 0))
+             (let* ((match (#,(peg-sexp-compile #'pat (baf accum))
+                            str strlen end))
+                    (new-end (if match (car match) end))
+                    (count (if (> new-end end) (1+ count) count)))
+               (if (> new-end end)
+                   (push-not-null! body (single-filter (cadr match))))
+               (if (and (> new-end end)
+                        #,(syntax-case #'num (+ * ?)
+                            (n (number? (syntax->datum #'n))
+                               #'(< count n))
+                            (+ #t)
+                            (* #t)
+                            (? #'(< count 1))))
+                   (lp new-end count)
+                   (let ((success #,(syntax-case #'num (+ * ?)
+                                      (n (number? (syntax->datum #'n))
+                                         #'(= count n))
+                                      (+ #'(>= count 1))
+                                      (* #t)
+                                      (? #t))))
+                     #,(syntax-case #'type (! & lit)
+                         (!
+                          #`(if success
+                                #f
+                                #,(cggr (baf accum) 'cg-body #''() #'at)))
+                         (&
+                          #`(and success
+                                 #,(cggr (baf accum) 'cg-body #''() #'at)))
+                         (lit
+                          #`(and success
+                                 #,(cggr (baf accum) 'cg-body
+                                         #'(reverse body) 
#'new-end)))))))))))))
+
+;; Association list of functions to handle different expressions as PEGs
+(define peg-compiler-alist '())
+
+(define (add-peg-compiler! symbol function)
+  (set! peg-compiler-alist
+        (assq-set! peg-compiler-alist symbol function)))
+
+(add-peg-compiler! 'range cg-range)
+(add-peg-compiler! 'ignore cg-ignore)
+(add-peg-compiler! 'capture cg-capture)
+(add-peg-compiler! 'and cg-and)
+(add-peg-compiler! 'or cg-or)
+(add-peg-compiler! 'body cg-body)
+
+;; Takes an arbitrary expressions and accumulation variable, then parses it.
+;; E.g.: (peg-sexp-compile syntax '(and "abc" (or "-" (range #\a #\z))) 'all)
+(define (peg-sexp-compile pat accum)
+  (syntax-case pat (peg-any range ignore capture peg and or body)
+    (peg-any
+     (cg-peg-any (baf accum)))
+    (sym (identifier? #'sym) ;; nonterminal
+     #'sym)
+    (str (string? (syntax->datum #'str)) ;; literal string
+     (cg-string (syntax->datum #'str) (baf accum)))
+    ((name . args) (let* ((nm (syntax->datum #'name))
+                          (entry (assq-ref peg-compiler-alist nm)))
+                     (if entry
+                         (entry #'args accum)
+                         (error "Bad peg form" nm #'args
+                                "Not one of" (map car peg-compiler-alist)))))))
+
+;; Packages the results of a parser
+(define (wrap-parser-for-users for-syntax parser accumsym s-syn)
+   #`(lambda (str strlen at)
+      (let ((res (#,parser str strlen at)))
+        ;; Try to match the nonterminal.
+        (if res
+            ;; If we matched, do some post-processing to figure out
+            ;; what data to propagate upward.
+            (let ((at (car res))
+                  (body (cadr res)))
+              #,(cond
+                 ((eq? accumsym 'name)
+                  #`(list at '#,s-syn))
+                 ((eq? accumsym 'all)
+                  #`(list (car res)
+                          (cond
+                           ((not (list? body))
+                            (list '#,s-syn body))
+                           ((null? body) '#,s-syn)
+                           ((symbol? (car body))
+                            (list '#,s-syn body))
+                           (else (cons '#,s-syn body)))))
+                 ((eq? accumsym 'none) #`(list (car res) '()))
+                 (else #`(begin res))))
+            ;; If we didn't match, just return false.
+            #f))))
diff --git a/module/ice-9/peg/match-record.scm 
b/module/ice-9/peg/match-record.scm
deleted file mode 100644
index 87785a5..0000000
--- a/module/ice-9/peg/match-record.scm
+++ /dev/null
@@ -1,43 +0,0 @@
-;;;; match-record.scm --- records to hold PEG parser results
-;;;;
-;;;;   Copyright (C) 2010, 2011 Free Software Foundation, Inc.
-;;;;
-;;;; This library is free software; you can redistribute it and/or
-;;;; modify it under the terms of the GNU Lesser General Public
-;;;; License as published by the Free Software Foundation; either
-;;;; version 3 of the License, or (at your option) any later version.
-;;;; 
-;;;; This library is distributed in the hope that it will be useful,
-;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
-;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-;;;; Lesser General Public License for more details.
-;;;; 
-;;;; You should have received a copy of the GNU Lesser General Public
-;;;; License along with this library; if not, write to the Free Software
-;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
USA
-;;;;
-
-(define-module (ice-9 peg match-record)
-  #:export (prec make-prec peg:start peg:end peg:string
-            peg:tree peg:substring peg-record?))
-
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-;;;;; PMATCH STRUCTURE MUNGING
-;; Pretty self-explanatory.
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-
-(define prec
-  (make-record-type "peg" '(start end string tree)))
-(define make-prec
-  (record-constructor prec '(start end string tree)))
-(define (peg:start pm)
-  (if pm ((record-accessor prec 'start) pm) #f))
-(define (peg:end pm)
-  (if pm ((record-accessor prec 'end) pm) #f))
-(define (peg:string pm)
-  (if pm ((record-accessor prec 'string) pm) #f))
-(define (peg:tree pm)
-  (if pm ((record-accessor prec 'tree) pm) #f))
-(define (peg:substring pm)
-  (if pm (substring (peg:string pm) (peg:start pm) (peg:end pm)) #f))
-(define peg-record? (record-predicate prec))
diff --git a/module/ice-9/peg/string-peg.scm b/module/ice-9/peg/string-peg.scm
index f7e21f6..ed09aae 100644
--- a/module/ice-9/peg/string-peg.scm
+++ b/module/ice-9/peg/string-peg.scm
@@ -18,20 +18,13 @@
 ;;;;
 
 (define-module (ice-9 peg string-peg)
-  #:export (peg-string-compile
-            peg-as-peg
+  #:export (peg-as-peg
             define-grammar
             define-grammar-f
-            define-nonterm
             peg-grammar)
-  #:use-module (ice-9 peg)
-  #:use-module (ice-9 peg codegen))
-
-;; The results of parsing using a nonterminal are cached.  Think of it like a
-;; hash with no conflict resolution.  Process for deciding on the cache size
-;; wasn't very scientific; just ran the benchmarks and stopped a little after
-;; the point of diminishing returns on my box.
-(define *cache-size* 512)
+  #:use-module (ice-9 peg using-parsers)
+  #:use-module (ice-9 peg codegen)
+  #:use-module (ice-9 peg simplify-tree))
 
 ;; Gets the left-hand depth of a list.
 (define (depth lst)
@@ -39,58 +32,6 @@
       0
       (+ 1 (depth (car lst)))))
 
-(eval-when (compile load eval)
-(define (syntax-for-non-cache-case for-syntax matchf-syn accumsym s-syn)
-;  (let ((matchf-syn (datum->syntax for-syntax matchf)))
-   #`(lambda (str strlen at)
-      (let ((res (#,matchf-syn str strlen at)))
-        ;; Try to match the nonterminal.
-        (if res
-            ;; If we matched, do some post-processing to figure out
-            ;; what data to propagate upward.
-            (let ((at (car res))
-                  (body (cadr res)))
-              #,(cond
-                 ((eq? accumsym 'name)
-                  #`(list at '#,s-syn))
-                 ((eq? accumsym 'all)
-                  #`(list (car res)
-                          (cond
-                           ((not (list? body))
-                            (list '#,s-syn body))
-                           ((null? body) '#,s-syn)
-                           ((symbol? (car body))
-                            (list '#,s-syn body))
-                           (else (cons '#,s-syn body)))))
-                 ((eq? accumsym 'none) #`(list (car res) '()))
-                 (else #`(begin res))))
-            ;; If we didn't match, just return false.
-            #f))))
-)
-
-;; Defines a new nonterminal symbol accumulating with ACCUM.
-(define-syntax define-nonterm
-  (lambda (x)
-    (syntax-case x ()
-      ((_ sym accum pat)
-       (let ((matchf (peg-sexp-compile #'pat (syntax->datum #'accum)))
-             (accumsym (syntax->datum #'accum))
-             (c (datum->syntax x (gensym))));; the cache
-         ;; CODE is the code to parse the string if the result isn't cached.
-         (let ((syn (syntax-for-non-cache-case x matchf accumsym #'sym)))
-           #`(begin
-               (define #,c (make-vector *cache-size* #f));; the cache
-               (define (sym str strlen at)
-                 (let* ((vref (vector-ref #,c (modulo at *cache-size*))))
-                   ;; Check to see whether the value is cached.
-                   (if (and vref (eq? (car vref) str) (= (cadr vref) at))
-                       (caddr vref);; If it is return it.
-                       (let ((fres ;; Else calculate it and cache it.
-                              (#,syn str strlen at)))
-                         (vector-set! #,c (modulo at *cache-size*)
-                                      (list str at fres))
-                         fres)))))))))))
-
 
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;;;; Parse string PEGs using sexp PEGs.
 ;; See the variable PEG-AS-PEG for an easier-to-read syntax.
@@ -114,34 +55,43 @@ LB < '['
 RB < ']'
 ")
 
-(define-nonterm peg-grammar all
+(define-syntax define-sexp-parser
+  (lambda (x)
+    (syntax-case x ()
+      ((_ sym accum pat)
+       (let* ((matchf (peg-sexp-compile #'pat (syntax->datum #'accum)))
+              (accumsym (syntax->datum #'accum))
+              (syn (wrap-parser-for-users x matchf accumsym #'sym)))
+           #`(define sym #,syn))))))
+
+(define-sexp-parser peg-grammar all
   (body lit (and peg-nonterminal (or "<--" "<-" "<") peg-sp peg-pattern) +))
-(define-nonterm peg-pattern all
+(define-sexp-parser peg-pattern all
   (and peg-alternative
        (body lit (and (ignore "/") peg-sp peg-alternative) *)))
-(define-nonterm peg-alternative all
+(define-sexp-parser peg-alternative all
   (body lit (and (body lit (or "!" "&") ?) peg-sp peg-suffix) +))
-(define-nonterm peg-suffix all
+(define-sexp-parser peg-suffix all
   (and peg-primary (body lit (and (or "*" "+" "?") peg-sp) *)))
-(define-nonterm peg-primary all
+(define-sexp-parser peg-primary all
   (or (and "(" peg-sp peg-pattern ")" peg-sp)
       (and "." peg-sp)
       peg-literal
       peg-charclass
       (and peg-nonterminal (body ! "<" 1))))
-(define-nonterm peg-literal all
+(define-sexp-parser peg-literal all
   (and "'" (body lit (and (body ! "'" 1) peg-any) *) "'" peg-sp))
-(define-nonterm peg-charclass all
+(define-sexp-parser peg-charclass all
   (and (ignore "[")
        (body lit (and (body ! "]" 1)
                       (or charclass-range charclass-single)) *)
        (ignore "]")
        peg-sp))
-(define-nonterm charclass-range all (and peg-any "-" peg-any))
-(define-nonterm charclass-single all peg-any)
-(define-nonterm peg-nonterminal all
+(define-sexp-parser charclass-range all (and peg-any "-" peg-any))
+(define-sexp-parser charclass-single all peg-any)
+(define-sexp-parser peg-nonterminal all
   (and (body lit (or (range #\a #\z) (range #\A #\Z) (range #\0 #\9) "-") +) 
peg-sp))
-(define-nonterm peg-sp none
+(define-sexp-parser peg-sp none
   (body lit (or " " "\t" "\n") *))
 
 
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@@ -296,10 +246,17 @@ RB < ']'
                  (compressor-core (syntax->datum syn))))
 
 ;; Builds a lambda-expressions for the pattern STR using accum.
-(define (peg-string-compile str-stx accum)
-  (peg-sexp-compile
-   (compressor
-    (peg-pattern->defn
-     (peg:tree (peg-parse peg-pattern (syntax->datum str-stx))) str-stx)
-    str-stx)
-   accum))
+(define (peg-string-compile args accum)
+  (syntax-case args ()
+    ((str-stx) (string? (syntax->datum #'str-stx))
+     (let ((string (syntax->datum #'str-stx)))
+       (peg-sexp-compile
+        (compressor
+         (peg-pattern->defn
+          (peg:tree (peg-parse peg-pattern string)) #'str-stx)
+         #'str-stx)
+        (if (eq? accum 'all) 'body accum))))
+     (else (error "Bad embedded PEG string" args))))
+
+(add-peg-compiler! 'peg peg-string-compile)
+
diff --git a/module/ice-9/peg.scm b/module/ice-9/peg/using-parsers.scm
similarity index 62%
copy from module/ice-9/peg.scm
copy to module/ice-9/peg/using-parsers.scm
index 644af6d..ec42583 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg/using-parsers.scm
@@ -1,4 +1,4 @@
-;;;; peg.scm --- Parsing Expression Grammar (PEG) parser generator
+;;;; using-parsers.scm --- utilities to make using parsers easier
 ;;;;
 ;;;;   Copyright (C) 2010, 2011 Free Software Foundation, Inc.
 ;;;;
@@ -17,28 +17,13 @@
 ;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
USA
 ;;;;
 
-(define-module (ice-9 peg)
-  #:export (peg-parse
-;            define-nonterm
-;            define-nonterm-f
-            peg-match)
-;  #:export-syntax (define-nonterm)
-  #:use-module (ice-9 peg codegen)
-  #:use-module (ice-9 peg string-peg)
+(define-module (ice-9 peg using-parsers)
   #:use-module (ice-9 peg simplify-tree)
-  #:use-module (ice-9 peg match-record)
-  #:re-export (peg-sexp-compile
-               define-grammar
-               define-grammar-f
-               define-nonterm
-               keyword-flatten
-               context-flatten
-               peg:start
-               peg:end
-               peg:string
-               peg:tree
-               peg:substring
-               peg-record?))
+  #:use-module (ice-9 peg codegen)
+  #:use-module (ice-9 peg cache)
+  #:export (peg-parse define-nonterm peg-match
+            prec make-prec peg:start peg:end peg:string
+            peg:tree peg:substring peg-record?))
 
 ;;;
 ;;; Helper Macros
@@ -67,6 +52,22 @@ execute the STMTs and try again."
         #f
         (make-prec 0 (car res) string (string-collapse (cadr res))))))
 
+;; Defines a new nonterminal symbol accumulating with ACCUM.
+(define-syntax define-nonterm
+  (lambda (x)
+    (syntax-case x ()
+      ((_ sym accum pat)
+       (let ((matchf (peg-sexp-compile #'pat (syntax->datum #'accum)))
+             (accumsym (syntax->datum #'accum)))
+         ;; CODE is the code to parse the string if the result isn't cached.
+         (let ((syn (wrap-parser-for-users x matchf accumsym #'sym)))
+           #`(define sym #,(cg-cached-parser syn))))))))
+
+(define (peg-like->peg pat)
+  (syntax-case pat ()
+    (str (string? (syntax->datum #'str)) #'(peg str))
+    (else pat)))
+
 ;; Searches through STRING for something that parses to PEG-MATCHER.  Think
 ;; regexp search.
 (define-syntax peg-match
@@ -74,9 +75,7 @@ execute the STMTs and try again."
     (syntax-case x ()
       ((_ pattern string-uncopied)
        (let ((pmsym (syntax->datum #'pattern)))
-         (let ((matcher (if (string? (syntax->datum #'pattern))
-                            (peg-string-compile #'pattern 'body)
-                            (peg-sexp-compile #'pattern 'body))))
+         (let ((matcher (peg-sexp-compile (peg-like->peg #'pattern) 'body)))
            ;; We copy the string before using it because it might have been
            ;; modified in-place since the last time it was parsed, which would
            ;; invalidate the cache.  Guile uses copy-on-write for strings, so
@@ -94,3 +93,24 @@ execute the STMTs and try again."
                        (make-prec
                         at end string
                         (string-collapse match))))))))))))
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;;;;; PMATCH STRUCTURE MUNGING
+;; Pretty self-explanatory.
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define prec
+  (make-record-type "peg" '(start end string tree)))
+(define make-prec
+  (record-constructor prec '(start end string tree)))
+(define (peg:start pm)
+  (if pm ((record-accessor prec 'start) pm) #f))
+(define (peg:end pm)
+  (if pm ((record-accessor prec 'end) pm) #f))
+(define (peg:string pm)
+  (if pm ((record-accessor prec 'string) pm) #f))
+(define (peg:tree pm)
+  (if pm ((record-accessor prec 'tree) pm) #f))
+(define (peg:substring pm)
+  (if pm (substring (peg:string pm) (peg:start pm) (peg:end pm)) #f))
+(define peg-record? (record-predicate prec))


hooks/post-receive
-- 
GNU Guile



reply via email to

[Prev in Thread] Current Thread [Next in Thread]