bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for Dlang support 1/2] d: change the return value of yylex fr


From: Akim Demaille
Subject: Re: [PATCH for Dlang support 1/2] d: change the return value of yylex from TokenKind to YYParser.Symbol
Date: Mon, 21 Dec 2020 07:34:03 +0100

Hi Adela,

This is a great piece of work!  The overall result is much better,
well done!

> Le 18 déc. 2020 à 19:37, Adela Vais <adela.vais99@gmail.com> a écrit :
> 
> Hello!
> 
> I made the suggested modifications.
> 
> I have a question: given that the user will provide all the needed
> information through the return value, should semanticVal(), startPos() and
> endPos() still be part of the Lexer interface?

No, they are not useful.  They make sense only with split symbols.

Besides, in that case (split symbols) I personally think the right
interface is that of locations, not that of positions.  IMHO it was
an error in Java to put forward positions, and leave locations to
the parser only.  C and C++ show only location-based interfaces.
It's up to the user to decide that a location is nothing but a
position if she thinks two positions is too costly (but that would
be mean to the users).


> În vin., 20 nov. 2020 la 20:18, Akim Demaille <akim@lrde.epita.fr> a scris:
> 
>> You should introduce type aliases for b4_yystype and YYLocation.
>> In the C++ parser, you have value_type and location_type which
>> are defined to whatever they are actually.  The code is nicer
>> to read, with fewer occurrences of ugly YYnames.
>> 
>> 
> I named them Location and Value.

Excellent.

> Should I also change b4_location_type to
> the new Location alias throughout lalr1.d? I know that the user should not
> use yy names (and before these commits, the examples used YYLocation) but I
> don't know how much of the backend I should change. I changed the
> b4_yystype occurrences.

I would also use Location everywhere.


>>>        if (yychar == TokenKind.]b4_symbol(empty, id)[)
>>>        {]b4_parse_trace_if([[
>>>          yycdebugln ("Reading a token");]])[
>>> -          yychar = yylex ();]b4_locations_if([[
>>> -          static if (yy_location_is_class) {
>>> -            yylloc = new ]b4_location_type[(yylexer.startPos,
>> yylexer.endPos);
>>> -          } else {
>>> -            yylloc = ]b4_location_type[(yylexer.startPos,
>> yylexer.endPos);
>>> -          }]])
>>> -          yylval = yylexer.semanticVal;[
>>> +          Symbol yysymbol = yylex();
>>> +          yychar = yysymbol.token();
>> 
>> Maybe you don't need yychar, but only need yytoken.  You probably
>> can avoid dealing with the token-kinds here, and deal only with
>> the symbol kinds.
>> 
> Done.

Great.  This part, however, is too complex and not very clear:

> -          if (yychar <= TokenKind.]b4_symbol(eof, [id])[)
> +          if (yytoken <= ]b4_symbol(eof, [kind])[)
>            {
>              /* Return failure if at end of input.  */
> -            if (yychar == TokenKind.]b4_symbol(eof, [id])[)
> +            if (yytoken == ]b4_symbol(eof, [kind])[)
>               return false;
>            }
>            else
> -            yychar = TokenKind.]b4_symbol(empty, id)[;
> +            yytoken = ]b4_symbol(empty, kind)[;

When accepting token kinds as return values from yylex, we tolerate
any nonpositive number to denote EOF.  Hence the first if.  However
when using symbol kinds, we should not accept any deviation from
the symbols kinds.  So that should be a single check "== eof", not
two.

Maybe have a look at what was done in C++.

Again, great work, thanks!

Cheers!




reply via email to

[Prev in Thread] Current Thread [Next in Thread]