[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: multistart: use YYSTYPE

From: Akim Demaille
Subject: Re: multistart: use YYSTYPE
Date: Sun, 4 Oct 2020 16:47:48 +0200

Hi Rici,

> Le 27 sept. 2020 à 20:46, Rici Lake <> a écrit :
> I want to focus on the
> advantages of using YYSTYPE to represent the returned value.
> In particular, it's highly flexible. You don't need to restrict YYSTYPE in
> any way. I say this partly because I still prefer tagged types to `%define
> api.value.type union`. Using actual C types has a certain appeal but in
> practice when I try this style, I inevitably end up using typedefs to
> create type aliases, even for primitive types (so that I don't need to edit
> every %type declaration when I decide that long should be uint64_t.). Maybe
> that's just old-fogeyism :-)

No, I understand this, and do the same.  But I do prefer typedef + %define 
api.value.type union, which is consistent throughout the application, to using 
%union, where parser uses a different vocabulary than the rest of the program.

> But there are a number of parsing applications
> in which YYSTYPE is not a C union at all (for example, when it's some kind
> of discriminated union), and these should also be usable with the multiple
> start symbol interface.

Yes, that's definitely something on which my model chokes.

> Also, once you decide to use a compound return type (whether it's a direct
> return or indirect via an out parameter) which includes a specific value
> type alternative, you're committed to create several different structure
> types, each used only for a particular call. While there are not likely to
> be many such structures, it feels a bit ugly, at least in C.

I personally don't feel this is ugly at all.

> would require only one compound type (and only one type name), and the
> client is going to have to extract the value from the compound anyway. (I
> understand that if you `#define api.value.type union` then you don't have a
> convenient tagname to extract the value, which means resorting to an ugly
> cast. Since your proposal only applies for this particular case, it makes
> some sense to do the cast automatically. But as I said above, I think it
> would be nicer if multiple start symbols were more general.)

I have the feeling that your concerns would be addressed if I made
the following function public:

> typedef struct
> {
>   YYSTYPE yyvalue;
>   int yynerrs;
> } yy_parse_impl_t;
> // Run a full parse, using YYCHAR as switching token.
> static int
> yy_parse_impl (int yychar, yy_parse_impl_t *yyimpl);

All the other functions are implemented on top of it.  For instance

> yyparse_input_t
> yyparse_input (void)
> {
>   yyparse_input_t yyres;
>   yy_parse_impl_t yyimpl;
>   yyres.yystatus = yy_parse_impl (TOK_YY_PARSE_input, &yyimpl);
>   yyres.yynerrs = yyimpl.yynerrs;
>   return yyres;
> }

I still much prefer to have a structure rather than just passing
more arguments for yynerrs and possibly others, so that we can
have more members in the future if we feel the need for it.

> Finally, let me note that (aside from didactic issues, like "how easy is
> this to explain to SO questioners?"), this change isn't going to affect me
> personally because I almost always use the push interface. With the push
> interface, implementing start-symbol sentinels is extremely easy (which is
> not to say that a bit of assistance wouldn't be appreciated).

I agree.  Push parsers are great, and I've often been willing to get
rid of the pull parser implementation, and to offer a pull-on-top-of-push
instance (what you get with %define api.push-pull both).  Unfortunately
benchmarks show that pull is faster than push, and some people might be
angry if replaced the native pull by pull-on-top-of-push.  Using Bison
benchmarking tool:

> $ ./_build/g9d/etc/ --gbench --bench push
> Entering directory `benches/241'
> Using bison=/Users/akim/src/gnu/bison/_build/g9d/tests/bison.
>   0. %define api.pure %define api.push-pull both
>   1. %define api.push-pull both
>   2. %define api.pure
>   3. 
> Run on (8 X 2900 MHz CPU s)
> CPU Caches:
>   L1 Data 32K (x4)
>   L1 Instruction 32K (x4)
>   L2 Unified 262K (x4)
>   L3 Unified 8388K (x1)
> --------------------------------------------------
> Benchmark           Time           CPU Iterations
> --------------------------------------------------
> BM_y0           10283 ns      10274 ns      65229
> BM_y1           10168 ns      10156 ns      70180
> BM_y2            9279 ns       9270 ns      74935
> BM_y3            9047 ns       9038 ns      76379

This figures are quite stable.

$ cd ./benches/latest
$ make rand
Benchmark           Time           CPU Iterations
BM_y3            9266 ns       9258 ns      71538
BM_y1           10377 ns      10368 ns      67121
BM_y0           10169 ns      10158 ns      67373
BM_y2            9289 ns       9279 ns      75201
$ make rand
Benchmark           Time           CPU Iterations
BM_y1           10262 ns      10252 ns      65069
BM_y3            9204 ns       9195 ns      74677
BM_y0           10191 ns      10182 ns      67133
BM_y2            8759 ns       8750 ns      77941

So that's more than 10% of performance penalty to use a pure pull parser.
(This parser spend little time in the actions: it's the usual calculator).

> Most of the time, what I'd really like is a way of extending the push-parse
> context object to hold some extra members, one of which would be the return
> value from the final parse call.

Agreed.  I feel sorry that %parse-param was not used to extend the
context object in the case of push parsers, instead of passing %parse-params
around all the time.  In the C++, that's the way %parse-param is implemented:
more members in the parser object.

> Now, let me just plant that issue there for push parsers, and go back to
> reentrant pull parsers. (Reentrant, because one would really want to start
> promoting reentrant parsers a bit more at this point, since reliance on
> globals is a technique from a different era.) Bison's reentrant parsers
> don't require a context object, which is cool in its own way but also a
> limitation.

Yes, it is.

> (For one thing, it makes aspects of the parser state
> inaccessible outside of action code.) Suppose, instead, that the multiple
> start symbol feature required a reentrant parser with a context object.
> With that change, yynerrs and the symbol's return value (i.e. the top of
> the stack when the root symbol is reduced) could just be kept in the
> context object, in a way which would be consistent with push parsers and
> reasonably easy for the client code. I'm not making a formal proposal,
> here: it's just an alternative to consider.

Something I have been considering many times, indeed.  But I'm afraid
of the performance penalty.

I'll see if I can explore that track.  (But before, I have to really
be sure about the needed changes in IELR).

> Again, sorry for the delay in responding. I'll try to be attentive to my
> email, but as a result of the move my internet access will be very
> intermittent until Thursday or so.

I hope things are better now on your side :)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]