help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: improving error message (was: bison for nlp)


From: Hans Åberg
Subject: Re: improving error message (was: bison for nlp)
Date: Sat, 10 Nov 2018 10:38:12 +0100

> On 10 Nov 2018, at 09:02, Akim Demaille <address@hidden> wrote:
> 
> Hi Hans,

Hello Akim,

>>>>> Yes.  Some day we will work on improving error message generation,
>>>>> there is much demand.
>>>> 
>>>> One thing I’d like to have is if there is an error with say a identifier, 
>>>> also writing the out the name of it.
>>> 
>>> Yes, that’s a common desire.  However, I don’t think it’s really
>>> what people need, because the way you print the semantic value
>>> might differ from what you actually wrote.  For instance, if I have
>>> a syntax error involving an integer literal written in binary,
>>> say 0b101010, then I will be surprised to read that I have an error
>>> involving 42.
>>> 
>>> So you would need to cary the exact string from the scanner to the
>>> parser, and I think that’s too much to ask for.    
>> 
>> That is what I do. So I merely want an extra argument in the error reporting 
>> function where it can be put.
> 
> Please, be clearer: what extra argument, and show how the parser
> can provide it.  

Yes, I need to analyze it and get back.

> Also, see if using %param does not already
> give you what you need to pass information from the scanner to the
> parser’s yyerror.

How would that get into the yyerror function?

>>> I believe that the right approach is rather the one we have in compilers
>>> and in bison: caret errors.
>>> 
>>> $ cat /tmp/foo.y
>>> %token FOO 0xff 0xff
>>> %%
>>> exp:;
>>> $ LC_ALL=C bison /tmp/foo.y
>>> /tmp/foo.y:1.17-20: error: syntax error, unexpected integer
>>> %token FOO 0xff 0xff
>>>                ^^^^
>>> I would have been bothered by « unexpected 255 ».
>> 
>> Currently, that’s for those still using only ASCII.
> 
> No, it’s not, it works with UTF-8.  Bison’s count of characters is mostly
> correct.  I’m talking about Bison’s own location, used to parse grammars,
> which is improved compared to what we ship in generated parsers.

Ah. I thought of errors for the generated parser only. Then I only report byte 
count, but using character count will probably not help much for caret errors, 
as they vary in width. Then problem is that caret errors use two lines which 
are hard to synchronize in Unicode. So perhaps some kind of one line markup 
instead might do the trick.

>> I am using Unicode characters and LC_CTYPE=UTF-8, so it will not display 
>> properly. In fact, I am using special code to even write out Unicode 
>> characters in the error strings, since Bison assumes all strings are ASCII, 
>> the bytes with the high bit set being translated into escape sequences.
> 
> Yes, I’m aware of this issue, and we have to address it.

For what I could see, the function that converts it to escapes is sometimes 
applied once and sometimes twice, relying on that it is an idempotent.

> We also have to provide support for internationalization of
> the token names.

Personally, I don't have any need for that. I use strings, like
  %token logical_not_key "¬"
  %token logical_and_key "∧"
  %token logical_or_key "∨"
and in the case there are names, they typically match what the lexer identifies.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]