help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: how to get left hand side symbol in action


From: Akim Demaille
Subject: Re: how to get left hand side symbol in action
Date: Fri, 10 May 2019 19:34:54 +0200

Hey Christian,

Thanks a lot for taking the time to give details about your use case.

> Le 10 mai 2019 à 15:11, Christian Schoenebeck <address@hidden> a écrit :
> 
> On Freitag, 10. Mai 2019 07:24:51 CEST Akim Demaille wrote:
> 
>> Aren't you referring to LA correction, as implemented in Bison?
>> 
>> https://www.gnu.org/software/bison/manual/html_node/LAC.html
> 
> Well no, it is has somewhat in common, but the use cases are differently, see 
> below.

And there was nothing that could be shared?  A lot of what you described
below looks like what LAC does.  But I am definitely not a LAC expert, nor
one of your application, obviously :)

The more I read what you do, the more I think it's the same thing.  But
then of course one issue is that LAC is supported by yacc.c only currently,
one would need to port it to lalr1.cc.


>> I think you are referring to the name of the tokens, not all the symbols.
>> For the error messages, it makes sense.  Although I am now more convinced
>> that most of the time, error messages are more readable when you quote
>> exactly the source than when you print token names.
> 
> No, I really mean the non-terminal symbols. Because sometimes [not always 
> ;-)] 
> I don't use the classic approach of separate lexer and parser, but rather let 
> the parser do the lexer job as well, like:
> 
> FOO : 'F''O''O' ;

Ok.  But then we face exactly what I'm saying: you are constrained
by the syntax of symbols.  If you had say regular expressions in
your grammar, you would be forced to write

regular_expresion: ...

and display "regular_expression" in your messages.  That's ugly.
Using symbol identifiers is not correct.  It does not fully fit your
need, it just "mostly works".  I can't bake that into Bison.

That's exactly why, as I already said, tokens have user friendly
*names* in addition to the pure identifiers.  Names are ok, identifiers
are not.  So *if* we want some feature like this, we would have to
support naming non terminal symbols.


> which can avoid complex and error prone push and pop constructs that would be 
> required with a separate lexer approach and certain complex grammars.

I would like to include scannerless parsing in Bison in the future.
I have no idea when I will actually work on this, but that's definitely
something I have in mind.  That would be Bison 4 :)



>>> I do need these features for almost all parsers, hence for years (since
>>> not
>>> available directly with Bison) I have a huge bunch of code on top of the
>>> internal skeleton code to achieve them.
>> 
>> Is there available for reading somewhere?
> 
> [...]
> However I am not sure if my approach would be of use for you anyway.

I was really curious of understanding your use case, not really looking
for more features to maintain :)

Do you disable the default reductions?  Reading what you do, it seems
that it would make your computations more accurate.


> eventually the algorithm ends up 
> returning a result:
> 
>       std::map<String,BisonSymbolInfo>& expectedSymbols
> 
> where the result map (not a multi map actually) contains the possible next 
> grammar rules for the previously supplied parser state; key being the symbol 
> name, value is a struct (BisonSymbolInfo) holding a) the sequence of 
> characters expected next for satisfying that grammar symbol and b) a flag 
> whether the symbol is (considered as) terminal or a "real" non-terminal (see 
> below).

I guess BisonSymbolInfo is the most significant difference with LAC,
isn't it?  But the traversal is probably the same.



> Another problem with my pseudo-terminals (example FOO above): From Bison 
> point 
> of view, everything in my grammar are now non-terminals, and I don't want to 
> manually mark individual grammar rules to be either considered as "real" non-
> terminals and others to be considered as "terminals", So I automated that by 
> checking whether the same symbol can be resolved in more than one way, then 
> it 
> is a "real" non-terminal, and by checking if the right hand side of the rule 
> only contains characters, then it is considered as "terminal. And that 
> fundamental information is automatically used to provide appropriate error 
> messages and auto completion in a fully automated way.

It's unclear to me whether you do this at generation time, or at parse time.


>> Was the feature always fitting perfectly?  Never ever did it result in 
>> something somewhat incorrect?
> 
> I did not make a proof of correctness of the algorithm.

I was very ambiguous here, sorry!  I meant the feature of using symbol
names.

I understand that you want to be able to manipulate the symbols themselves.
What I am arguing about it that you probably don't need them as strings.
I tend to think you need them as an enum, just like the tokens, so that
you can map them to some real string or whatever other treatment.  But
handing them as strings used as keys in some container would be a useless
cost compared to an enum.



>> I beg to disagree.  Nobody should translate the keyword "break",
>> but
>> 
>>> # bison /tmp/foo.y
>>> /tmp/foo.y:1.7: erreur: erreur de syntaxe, : inattendu, attendait char ou
>>> identifier ou <tag>> 
>>>    1 | %token: FOO
>>> 
>>>      |       ^
>> 
>> looks stupid; "char", "identifier" and "<tag>" should be translated.
> 
> Well, my point was that translation is trivial. An average developer is 
> certainly able to solve required translation tasks very easily by custom code 
> if required.

Sure.  But since Bison provides support for canned error messages, it
should offer a means to do the whole job.  That was my point in

https://lists.gnu.org/archive/html/bison-patches/2018-12/msg00088.html

and

https://lists.gnu.org/archive/html/bison-patches/2019-01/msg00037.html

Cheers!


reply via email to

[Prev in Thread] Current Thread [Next in Thread]