chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] backtracking in abnf/lexgen


From: Ivan Raikov
Subject: Re: [Chicken-users] backtracking in abnf/lexgen
Date: Mon, 18 Sep 2017 14:29:18 -0700

Nevermind, I see now that the last character of PN_PREFIX can be
consumed as part of the repetition rule, and not as the PN_CHARS
In that case, I would still suggest the bind combinator:

(define (make-prefixed-name char-lst)
  ;; check that char-lst ends with a PN_CHAR  and create a prefixed
name object with e.g. string->symbol

(define PN_PREFIX
  (bind
   make-prefixed-name
   (concatenation
    PN_CHARS_BASE
    (optional-sequence
     (repetition
      (alternatives PN_CHARS (char-list/list ".")))
     ))
   ))



On Mon, Sep 18, 2017 at 2:06 PM, Ivan Raikov <address@hidden> wrote:
> Hi Nathaniel,
>
> It seems to me that the rules PN_PREFIX and PN_LOCAL already exclude
> strings ending with a '.'; e.g. in the case of PN_PREFIX, it must end
> with PN_CHARS_BASE or PN_CHARS. I think that the remark about LL1
> means that the various PN rules are just regular expressions. So in
> the case of PN_PREFIX I would think that this would work:
>
> (define PN_PREFIX
> (concatenation
>  PN_CHARS_BASE
>  (optional-sequence
>   (concatenation
>    (repetition
>     (alternatives PN_CHARS (char-list/list ".")))
>    PN_CHARS))
>  ))
>
> In other words, a PN_PREFIX either consists of PN_CHARS_BASE, or
> PN_CHARSE_BASE followed by an arbitrary combination of PN_CHARS and
> '.', followed by PN_CHARS.
> Is there a counterexample to this?
>
> On Mon, Sep 18, 2017 at 4:32 AM, Nathaniel Rudavsky-Brody
> <address@hidden> wrote:
>> Hi Ivan,
>>
>> Thanks a lot for your explanation. It helped me find the key I'd been
>> missing: "The SPARQL grammar is LL(1) when the rules with uppercased names
>> are used as terminals." So the uppercased names *do* need some kind of
>> backtracking.
>>
>> My example with the parentheses was a bit silly (indeed I'd misread the
>> grammar slightly), but I was trying to illustrate the basic problem which
>> still holds: PN_PREFIX and PN_LOCAL can't end with ".":
>>
>> PN_PREFIX  ::= PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)?
>>
>> PN_LOCAL ::=  (PN_CHARS_U | ':' | [0-9] | PLX ) ((PN_CHARS | '.' | ':' |
>> PLX)* (PN_CHARS | ':' | PLX) )?
>>
>> (from https://www.w3.org/TR/sparql11-query/#sparqlGrammar)
>>
>> As I mentioned, I was able to write a simple backtracking star to parse
>> these, but it didn't compose well with bar, since bar doesn't send back the
>> error which is needed to trigger the backtracking. The best I've been able
>> to get working is a "sandbox" combinator (code below) that limits the scope
>> of the backtracking, so I don't have to modify anything else in abnf/lexgen
>> or my parser. Does it look correct to you?
>>
>> I hope that since these are the only two places in the grammar that require
>> backtracking, and I'm able to isolate it, that the performance hit won't be
>> to big...?
>>
>> Nathaniel
>>
>>
>>
>>
>>
>>
>> (define (bstar p)
>>   (lambda (sk fk strm)
>>     (let ((try
>>            (lambda (s)
>>              (let ((ss (sk s)))
>>                (if (equal? ss '(error)) #f
>>                    ss)))))
>>       (p (lambda (strm1)
>>            (or
>>             ((bstar p) try try strm1)
>>             (sk strm)))
>>          sk
>>          strm))))
>>
>> (define (sandbox p)
>>   (lambda (sk fk strm)
>>     (let ((s (p values err strm)))
>>       (if (equal? s '(error))
>>           (fk strm)
>>           (sk s)))))
>>
>> (define PN_PREFIX
>>   (concatenation
>>    PN_CHARS_BASE
>>    (optional-sequence
>>     (sandbox
>>      (concatenation
>>       (bstar
>>        (alternatives
>>         (char-list/lit ".")
>>         PN_CHARS))
>>       PN_CHARS)))))
>>
>>
>>
>>
>> On Fri, Sep 15, 2017 at 5:42 PM, Ivan Raikov <address@hidden>
>> wrote:
>>>
>>> Hi Nathaniel,
>>>
>>>     No problem, thanks for trying to use lexgen and abnf. The excerpt
>>> from the grammar does not include the rules for PNAME, or PNAME_NS,
>>> but if I understand correctly, you are suggesting that PrefixedName
>>> could include parentheses which could cause the closing parenthesis of
>>> DataBlockValue to be consumed incorrectly. This would indeed require
>>> general backtracking, but it is a bit surprising to me, because
>>> usually ABNF grammars avoid backtracking as it can be very costly in
>>> terms of memory and computation time. For detailed explanation see
>>> e.g. here: http://www.artima.com/pins1ed/combinator-parsing.html#31.10
>>>
>>>    Parser combinator libraries such as lexgen tend to favor LL
>>> grammars and avoid backtracking because the latter requires keeping
>>> multiple copies of the input stream, and traversing back one token at
>>> a time until a rule succeeds, which usually ends up being
>>> prohibitively slow. I suppose a naive backtracking combinator in
>>> lexgen  could be implemented using something analogous to the
>>> bind/rebind combinators, but instead the argument combinator would be
>>> applied to the previous copy of the input stream in case of failure.
>>> But I suspect this would be too slow to be of any practical use.
>>>
>>>   Are you absolutely certain that the parentheses in PrefixedName can
>>> occur as something other than in escaped characters? Again, requiring
>>> left recursion and general backtracking is very unusual for an ABNF
>>> grammar, so better to make sure you understand the various PN_ rules.
>>> I am happy to help with that, just let me know.
>>>
>>> -Ivan
>>>
>>>
>>> On Fri, Sep 15, 2017 at 1:45 AM, Nathaniel Rudavsky-Brody
>>> <address@hidden> wrote:
>>> > Hi Ivan,
>>> >
>>> > Thanks for replying, and sorry for being unclear. By different levels I
>>> > just
>>> > mean that I need a non-greedy * for a low-level rule that gets combined
>>> > in
>>> > different ways higher up. I'm able to write a correct parser by
>>> > "flattening
>>> > out" the rules, but since it's a big grammar (173 rules) that quickly
>>> > gets
>>> > messy.
>>> >
>>> > Here's an excerpt from the real rules. Practically speaking, the problem
>>> > is
>>> > that by itself, "ex:abc)" is a valid PrefixedName, but in the expression
>>> > "(?var) { (ex:abc) }" only "ex:abc" should be parsed as a PrefixedName,
>>> > and
>>> > the ")" as part of the InlineDataFull.
>>> >
>>> >
>>> > PN_LOCAL_ESC  ::=  '\' ( '_' | '~' | '.' | '-' | '!' | '$' | '&' | "'" |
>>> > '('
>>> > | ')' | '*' | '+' | ',' | ';' | '=' | '/' | '?' | '#' | '@' | '%' )
>>> >
>>> > PLX  ::=  PERCENT | PN_LOCAL_ESC
>>> >
>>> > PN_LOCAL ::=  (PN_CHARS_U | ':' | [0-9] | PLX ) ((PN_CHARS | '.' | ':' |
>>> > PLX)* (PN_CHARS | ':' | PLX) )?
>>> >
>>> > PNAME_LN  ::=  PNAME
>>> >
>>> > PrefixedName  ::=  PNAME_LN | PNAME_NS
>>> >
>>> > iri  ::=  IRIREF | PrefixedName
>>> >
>>> > DataBlockValue  ::=  iri | RDFLiteral | NumericLiteral | BooleanLiteral
>>> > |
>>> > 'UNDEF'
>>> >
>>> > InlineDataFull  ::=  ( NIL | '(' Var* ')' ) '{' ( '(' DataBlockValue*
>>> > ')' |
>>> > NIL )* '}'
>>> >
>>> >
>>> >
>>> > With my limited understanding of lexgen's internals, I can write a *
>>> > that
>>> > backtracks if it catches an error from a later success continuation, but
>>> > this doesn't compose with something like opt that doesn't signal an
>>> > error:
>>> >
>>> > (define (bstar p)
>>> >   (lambda (sk fk strm)
>>> >     (let ((try
>>> >            (lambda (s)
>>> >              (let ((ss (sk s)))
>>> >                (if (equal? ss '(error)) #f
>>> >                    ss)))))
>>> >       (p (lambda (strm1)
>>> >            (or
>>> >             ((bstar p) try sk strm1)
>>> >             (sk strm)))
>>> >          sk
>>> >          strm))))
>>> >
>>> >
>>> > (lex (seq (bstar (lit "a")) (lit "a")) err "aaaaab")) ; => ((a a a a a)
>>> > (b))
>>> >
>>> > but
>>> >
>>> > (lex (opt (seq (bstar (lit a)) (lit a))) err "aaaaab") ; =>  (() (a a a
>>> > a a
>>> > b))
>>> >
>>> >
>>> > Does this make more sense?
>>> >
>>> > Nathaniel
>>> >
>>> >
>>> > On Thu, Sep 14, 2017 at 9:54 PM, Ivan Raikov <address@hidden>
>>> > wrote:
>>> >>
>>> >> Hi Nathaniel,
>>> >>
>>> >>     I don't understand what you mean by "different levels" here. Can
>>> >> you express your grammar in ABNF notation?
>>> >>
>>> >>    -Ivan
>>> >>
>>> >>
>>> >> On Thu, Sep 14, 2017 at 4:06 AM, Nathaniel Rudavsky-Brody
>>> >> <address@hidden> wrote:
>>> >> > Hi,
>>> >> >
>>> >> > I was wondering if anyone might have some advice for a beginner's
>>> >> > question
>>> >> > with abnf/lexgen. Clearly it has something to do with backtracking,
>>> >> > and
>>> >> > I
>>> >> > hope I'm just missing something simple -- 1500 lines into my parser,
>>> >> > I'd
>>> >> > hate to have to start from scratch with another approach!
>>> >> >
>>> >> >
>>> >> > For matching a string of a's and b's that ends with an "a", clearly
>>> >> >
>>> >> >
>>> >> > (lex (seq (star (bar (lit "a") (lit "b")))
>>> >> >                (lit "a"))
>>> >> >     error "ababa")  ; => error
>>> >> >
>>> >> >
>>> >> > For patterns on the same level, I can do manual backtracking:
>>> >> >
>>> >> >
>>> >> > (define-syntax vac
>>> >> >   (syntax-rules ()
>>> >> >     ((_ fn) (lambda args (apply fn args)))))
>>> >> >
>>> >> > (define (left-star-seq p1 p2)   ; p1*p2
>>> >> >   (vac
>>> >> >    (bar
>>> >> >     (seq p1 (left-star-seq p1 p2))
>>> >> >     p2)))
>>> >> >
>>> >> > (lex (left-star-seq
>>> >> >           (bar (lit "a") (lit "b"))
>>> >> >           (lit "a"))
>>> >> >   error "ababa")   ; =>  ((#\a #\b #\a #\b #\a) ())
>>> >> >
>>> >> >
>>> >> > but what if p1 and p2 are on completely different levels, something
>>> >> > like:
>>> >> >
>>> >> >
>>> >> > (define p1 (star (bar (lit "a") (lit "b"))))
>>> >> > (define p2 (lit "a"))
>>> >> > ...
>>> >> > (define q2 (seq q1 p1))
>>> >> > (define q4 (seq q3 q2))
>>> >> > (define Word (seq q4 p2))
>>> >> >
>>> >> >
>>> >> > Is there way to do "composable" backtracking with these tools? (and
>>> >> > is
>>> >> > the
>>> >> > question even the right one?)
>>> >> >
>>> >> > Thanks for any ideas,
>>> >> >
>>> >> > Nathaniel
>>> >> >
>>> >> > _______________________________________________
>>> >> > Chicken-users mailing list
>>> >> > address@hidden
>>> >> > https://lists.nongnu.org/mailman/listinfo/chicken-users
>>> >> >
>>> >
>>> >
>>
>>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]