bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: custom error messages


From: Akim Demaille
Subject: Re: custom error messages
Date: Sat, 4 Jan 2020 12:14:56 +0100

Hi Adrian,

> Le 3 janv. 2020 à 13:13, Adrian Vogelsgesang <address@hidden> a écrit :
> 
> Hi Akim,
>  
> I am happy to see you are looking into improving error reporting.
> We at Tableau also struggled with this. I think your proposed solution,
> in particular "parse.error custom", would improve things a lot for us.

Good!

> Our parser uses the C++ skeleton, though. In case you are interested,
> I could take care of the C++ work, as soon as the we know which exact
> interface we want to support.

Why not indeed.  It will have to be implemented in all our skeletons
anyway, D included.


> Our grammar is structured similarly to the Postgres grammar
> (https://github.com/postgres/postgres/blob/master/src/backend/parser/gram.y)
> If we take a look at ColId, type_function_name, NonReservedWord (lines 15064
> and following), we can see that in most places where IDENT (i.e. identifiers)
> are accepted, also "unreserved_keyword" are accepted. Now, if someone types
> > CREATE TABLE $1
> there is a syntax error near "$1" and the list of expected tokens would
> include the complete "unreserved_keyword" list, i.e. ~300 keywords. Obviously
> this is pointless.

Indeed.  I see your point.  This is probably the place where what I have
in mind for Bison 4 would shine.

> Hence, we have custom logic in place which checks if the IDENT token is
> among the expected tokens. If so, we supress all tokens with an internal
> token number larger than the token number of "ABORT_P" - which happen to be
> all keywords.
>  
> You can find our current "solution" attached to this email. You can run this
> Python script against a C++ parser generated by bison and it will inject our
> custom error handling code.

If I read correctly, the core of your approach is here:

> int ident=SQLParser::yytranslate_(token::IDENT);
> int abort_p=SQLParser::yytranslate_(token::ABORT_P);
> got = yytname_[yytoken]; if ((yytoken<abort_p)&&(got[0]=='"')) ++got;
> bool expectedIdent=false;
> int yyn = yypact_[yystate];
> if (!yy_pact_value_is_default_ (yyn)) {
>   int yyxbegin = yyn < 0 ? -yyn : 0;
>   int yychecklim = yylast_ - yyn + 1;
>   int yyxend = yychecklim < yyntokens_ ? yychecklim : yyntokens_;
>   for (int yyx = yyxbegin; yyx < yyxend; ++yyx) {
>     if (yycheck_[yyx + yyn] == yyx && yyx != yyterror_ && 
> !yy_table_value_is_error_ (yytable_[yyx + yyn]))
>       if (yyx==ident) { expectedIdent=true; break; }
>   }
>  }
> if (!yy_pact_value_is_default_ (yyn)) {
>   int yyxbegin = yyn < 0 ? -yyn : 0;
>   int yychecklim = yylast_ - yyn + 1;
>   int yyxend = yychecklim < yyntokens_ ? yychecklim : yyntokens_;
>   for (int yyx = yyxbegin; yyx < yyxend; ++yyx) {
>     if (yycheck_[yyx + yyn] == yyx && yyx != yyterror_ && 
> !yy_table_value_is_error_ (yytable_[yyx + yyn])) {
>       if ((expectedIdent)&&(yyx>=abort_p)) continue;
>       const char* p=yytname_[yyx]; if ((yyx<abort_p)&&(p[0]=='\"')) ++p; 
>       possibilities.push_back(p); 
>     }
>   }
>  }

in particular

> int ident=SQLParser::yytranslate_(token::IDENT);
> int abort_p=SQLParser::yytranslate_(token::ABORT_P);

and 

>       if (yyx==ident) { expectedIdent=true; break; }

which notices when IDENT is there, and then

>       if ((expectedIdent)&&(yyx>=abort_p)) continue;

which filters out the alternatives in that case.  Such a transformation is 
easier to implement with the symbol numbers, indeed.  My proposal (but 
returning an array of token numbers rather that token names) would work for 
such a case, but it would be a bit wasteful: we would first gather all the 
possible tokens in an array, and then remove the useless ones.

Would that be ok with you?  Do you have an alternative to propose?

I'm not super excited at the idea of exposing all these gory details to the 
users, I don't want to commit myself to have to support 
yy_pact_value_is_default_, and all this internal variables.  I want to stay 
high-level.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]