help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

improving error message (was: bison for nlp)


From: Akim Demaille
Subject: improving error message (was: bison for nlp)
Date: Sat, 10 Nov 2018 09:02:05 +0100

Hi Hans,

> Le 9 nov. 2018 à 14:45, Hans Åberg <address@hidden> a écrit :
> 
>> On 9 Nov 2018, at 12:11, Akim Demaille <address@hidden> wrote:
>> 
>>> Le 9 nov. 2018 à 09:58, Hans Åberg <address@hidden> a écrit :
>>> 
>>> 
>>>> On 9 Nov 2018, at 05:59, Akim Demaille <address@hidden> wrote:
>>>> 
>>>>> By the way, I’ll still get the error message as a string I guess, right?
>>>> 
>>>> Yes.  Some day we will work on improving error message generation,
>>>> there is much demand.
>>> 
>>> One thing I’d like to have is if there is an error with say a identifier, 
>>> also writing the out the name of it.
>> 
>> Yes, that’s a common desire.  However, I don’t think it’s really
>> what people need, because the way you print the semantic value
>> might differ from what you actually wrote.  For instance, if I have
>> a syntax error involving an integer literal written in binary,
>> say 0b101010, then I will be surprised to read that I have an error
>> involving 42.
>> 
>> So you would need to cary the exact string from the scanner to the
>> parser, and I think that’s too much to ask for.    
> 
> That is what I do. So I merely want an extra argument in the error reporting 
> function where it can be put.

Please, be clearer: what extra argument, and show how the parser
can provide it.  Also, see if using %param does not already
give you what you need to pass information from the scanner to the
parser’s yyerror.

>> I believe that the right approach is rather the one we have in compilers
>> and in bison: caret errors.
>> 
>> $ cat /tmp/foo.y
>> %token FOO 0xff 0xff
>> %%
>> exp:;
>> $ LC_ALL=C bison /tmp/foo.y
>> /tmp/foo.y:1.17-20: error: syntax error, unexpected integer
>> %token FOO 0xff 0xff
>>                 ^^^^
>> I would have been bothered by « unexpected 255 ».
> 
> Currently, that’s for those still using only ASCII.

No, it’s not, it works with UTF-8.  Bison’s count of characters is mostly
correct.  I’m talking about Bison’s own location, used to parse grammars,
which is improved compared to what we ship in generated parsers.

$ bison /tmp/foo.y
/tmp/foo.y:2.6: erreur: caractères invalides: « 💩 »
 exp: 💩 💩 💩 💩;
      ^
/tmp/foo.y:2.8: erreur: caractères invalides: « 💩 »
 exp: 💩 💩 💩 💩;
        ^
/tmp/foo.y:2.10: erreur: caractères invalides: « 💩 »
 exp: 💩 💩 💩 💩;
          ^
/tmp/foo.y:2.12: erreur: caractères invalides: « 💩 »
 exp: 💩 💩 💩 💩;
            ^

It will fail when there are composed characters, granted.  Don’t try
with the attached grammar.

Attachment: foo.y
Description: Binary data



> I am using Unicode characters and LC_CTYPE=UTF-8, so it will not display 
> properly. In fact, I am using special code to even write out Unicode 
> characters in the error strings, since Bison assumes all strings are ASCII, 
> the bytes with the high bit set being translated into escape sequences.

Yes, I’m aware of this issue, and we have to address it.
We also have to provide support for internationalization of
the token names.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]