bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gawk] Grammar improvement


From: Valentin Tolmer
Subject: [bug-gawk] Grammar improvement
Date: Tue, 10 Sep 2013 19:24:28 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130804 Thunderbird/17.0.8

Hello,

My name is Valentin Tolmer, and with the Google Summer of Code, I've worked on precedence and associativity in Bison grammars. More precisely, my work consisted in introducing a new syntax to declare partial order precedence relationships (instead of the total order that is the custom for now). Although my patch hasn't been accepted yet, the modifications introduced lead to particularly interesting improvements in gawk's grammar, in the organization of the precedence relationships. Here's a description of what it does and how it affects gawk's grammar, so that you might tell me what you think and maybe suggest some improvements. The basic idea of the improvement is to limit the over specification of precedence relationships, that might lead to unwanted conflicts resolutions. For example, if a new arithmetics operators is introduced and prioritized, but it has another meaning (see for example < in the description below), the other meaning will be prioritized too, maybe accidentally. By reducing the amount of useless precedence relationships declared, the grammar will be safer. The code for this improvement can be found on bison's git repo, in the partialorder branch.

When analyzing gawk's grammar with Bison (a modified bison that outputs the used precedence relationships), it is possible to single out basically 3 groups of prioritized tokens in the gawk grammar :
- The arithmetics operators (+, -, /, *, etc...)
- The comparison operators (<, >, RELOP, MATCH_OP, etc...)
- '(' and LEX_LENGTH, that have no link with any other token

There are two notable exceptions :
- < is used not only as a comparison operator, but also as an input redirection, which leads to it being compared to almost every arithmetics operator. - CONCAT_OP is used only in conjunction with +, - and /, because they are the only three operators that can start an expression (e. g. +7, -2 or regexp /[abc]*).

These precedence relationships can be very well expressed with the new system I introduced to declare precedence relationships. The updated grammar looks like this (only the token precedence declaration part is shown, as it is the only one changed) :


%gprec comp_op {
  %precedence ASSIGNOP
  %right '?' ':'
  %left LEX_OR
  %left LEX_AND

  %precedence LEX_IN

  %left MATCHOP
  %nonassoc RELOP '<' '>'
}

%gprec arith {
  %left '+' '-'
  %left '*' '/' '%'
  %precedence UNARY
  %right '^'
}

%gprec {
  %precedence CONCAT_OP
}

%gprec {
  %precedence LEX_LENGTH
  %precedence '('
}

%precr '<' < arith
%precr CONCAT_OP < '-' '+' '/'


Here, %gprec starts a group of token precedence, in which the order is total (every token can be compared to every other, as was the case with the previous system), but there is no link between tokens outside the group. Then additional precedence is defined with %precr statements. For example, '<' < arith means that the token '<' is of lower precedence than every token of the group arith.

If you weren't aware of this, a few versions ago, the keyword %precedence has been introduced to enable the declaration of a token without associativity, but with precedence. Every useless associativity has been removed from this grammar (another new feature of bison).

In the attachments you'll find the bison output of which precedence links are declared in this grammar: in black those that are declared and used to resolve conflicts, and in red those that are declared but useless.

Needless to say, the generated parser is the same, the only information removed was some useless one.

Please get back to me with any comment, suggestion or remark. Keep me in the CC field, as I am not subscribed to the newsletter.

Valentin Tolmer

Attachment: prec_report.pdf
Description: Adobe PDF document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]