bug-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Do not allow identifiers that start with a negative number.


From: Joel E. Denny
Subject: Re: [PATCH] Do not allow identifiers that start with a negative number.
Date: Sat, 8 Jan 2011 18:19:52 -0500 (EST)
User-agent: Alpine 2.00 (DEB 1167 2008-08-23)

On Sat, 8 Jan 2011, Paul Eggert wrote:

> On 01/08/2011 12:48 PM, Joel E. Denny wrote:
> 
> > I'm mentally parsing it according to the rules of Bison.  "$-39-" already 
> > has a meaning, but "$--39--" and "$---39---" do not.

I did not mean to imply that Bison does or should accept "$--39--" or 
"$---39---".  Instead, the user must write "$[--39--]" or "$[---39---]".

My point was that any integer, including a negative one, has long had 
meaning as a semantic value or location reference in Bison.

Fortunately, because Bison implements Yacc, Bison already has an easy way 
to make sure that an identifier beginning with an unsigned integer will 
not cause trouble in semantic value and location references: just never 
permit any identifier anywhere to begin with an unsigned integer.

Now that Akim has added dashes to identifiers, we encounter the same 
problem for negative integers.  Thus, I thought we would simply extend the 
existing solution for unsigned integers to negative integers: never permit 
any identifier anywhere to begin with an unsigned or negative integer.

I was then trying to say that, in contrast, the other forms I presented 
above do not have any prior meaning as semantic value or location 
references.  Neither does "$.foo", "$.39", or "$foo.bar", for example. 
Thus, there is no history that suggests we should solve their syntactic 
issues by restricting identifiers in general.  Instead, we have a new 
solution: bracketing.

> Yes, Bison experts can parse the code because they know all
> the ins and outs of the rules.  But surely ordinary users will
> be confused.  For example, if "--x" is an identifier, then
> in C code
> 
>    $--x
>
> is supposed to be parsed as "$" applied to the identifier "--x",

Bison will report an error for the above.

> but
> 
>    *--x
> 
> is supposed to be parsed as "*" applied to "--" applied to
> the identifier "x".
> 
> This is the sort of confusion I am worried about.

Ok, does my above explanation resolve that confusion?

> > We already have such a syntax: $[B].  However, it's only required when 
> > it's necessary
> 
> But there are two reasons it might be necessary, not just one.
> One reason is to avoid bugs or problems in implementation.

I'm not sure what you're referring to in that first reason.

> The other reason, which is as important, is to avoid confusion
> among users.  It is the latter reason that I am thinking about
> here.  Even if notation is formally unambiguous, it still may
> be unreasonably confusing.
> 
> I suggest that we allow the syntax $B only when B is a
> valid C identifier, and require the square brackets otherwise.

I believe that almost perfectly describes the behavior on branch-2.5 and 
master.  The only subtlety here is that, for example,

  $sym.field

is the same as

  $[sym].field

even if sym.field is a valid symbol name.  However, Bison is kind enough 
to warn you that this is misleading even though it's not formally 
ambiguous.

> >>     $x--;
> >>
> >> Most programmers would expect this to subtract one from $x,
> >> not to compute the value of the identifier "x--" and then
> >> discard it.  If a programmer really wants to name their
> >> identifier "x--" they should write "${x--}" or something
> >> like that.
> > 
> > I agree, and that's already what the programmer has to do.
> 
> Sorry, I'm not following this.  If the syntax is
> 
> id       -|({letter}|-({letter}|-))({letter}|[-0-9])*
> 
> then "x--" is a valid identifier, no?

Yes, it's valid.

> And if it is valid,
> then why isn't "$x--" parsed as "$" applied to the identifier
> "x--"?

Because Bison reparses the reference later to find "-" and ".".

> > It does allow "x".
> 
> Oh, sorry, I was wrong again, and you were right again.
> I'm having a great deal of trouble parsing that regular expression,
> so you may have to bear with me in yet another confusion of mine.
> Whatever regular expression we come up with, obviously we
> should document it (unconfusingly :-).

How about the following?  The manual currently says:

  Symbol names can contain letters, underscores, periods, dashes, and
  (not at the beginning) digits.

Let's change that to:

  A symbol name can be any sequence of letters, underscores, periods, 
  dashes, and digits that does not start with an integer (unsigned
  or negative).

Is that clear enough?

We can then copy that into a comment before the above regex.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]