bug-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: api.header.include and backward-compatible .y files


From: Adam Novak
Subject: Re: api.header.include and backward-compatible .y files
Date: Tue, 8 Sep 2020 12:18:48 -0700

It sounds like Bison is designed around the GNU project way of doing
software development, where there exist "maintainers" who are
different from end users, and who have an opportunity to prepare
"releases" which either contain more files than are checked into
version control (or at least an opportunity to update generated files
which are checked in).

Unfortunately, not all projects that depend on Bison work that way.
For a lot of projects, what's in Git is what gets shipped, and it may
or may not even get tagged with a version before it's used in another
project as a Git submodule. Then when the downstream project can't
build because Bison stopped accepting the upstream project's .y files,
the mismatch between how Bison wants to be used and how Git-centric
development wants to use it becomes a problem.

Would it be possible to introduce more standards besides just POSIX
yacc? C++ has --std=c++03, --std=c++11, --std=c++14, etc., and if I
add one of those flags to my build process (and don't use -Werror or
write undefined behavior), I can be reasonably sure that new versions
of GCC or Clang will accept my code, even if backward-incompatible
changes have been made in the latest version of the C++ language.

If we could name the version of the .y file language we want to write
with more granularity than just "very old POSIX code that nobody wants
to write" or "bleeding-edge Bison code that will be rejected due to
language changes in 4 years", that would be nice for projects that
don't ship prepped tarballs, or that have too many developers to get
them to all build Bison from source.

Maybe Bison's innovations should be submitted as suggestions for
improvements to the POSIX YACC standard? POSIX comes in multiple
versions just like the C++ standard does. So Bison would take
`--std=POSIX.1-2017`, `--std=POSIX.1-2023`, etc. If there's stuff that
people use that's broadly cross-compatible across YACCs, but isn't in
the standard, it ought to be added, right?

Alternately, Bison could have its own track of language versions, and
I could say something like `--lang=bison3.7` and know that Bison 3.8
will accept what I'm writing today, do broadly the same thing with it
forever, and ideally even throw an error if anything too new is used
by another developer who has upgraded before me. After all, just
because I updated my system to the next Ubuntu doesn't mean I want to
update all my projects to the latest Bison language version *right
now* and make the rest of my team upgrade too.

The other option would be .y file syntax for sniffing whether features
are available (like C ifdef), so that instead of having to pick a
single Bison version to code against, we could check the version and
provide the code for each, and be backward-compatible arbitrarily far
back (or at least to the introduction of the feature).



On 9/7/20, Kaz Kylheku <kaz@kylheku.com> wrote:
> On 2020-09-06 00:46, Akim Demaille wrote:
>> Kaz,
>>
>>> Le 5 sept. 2020 à 17:58, Kaz Kylheku <kaz@kylheku.com> a écrit :
>>>
>>> On 2020-08-30 05:21, Akim Demaille wrote:
>>>> Hi Adam,
>>>>> Le 22 août 2020 à 02:33, Adam Novak <anovak@soe.ucsc.edu> a écrit :
>>>>> Hello,
>>>>> I'm maintaining a .y file at
>>>>> https://github.com/vgteam/raptor/blob/master/src/turtle_parser.y
>>>>> that
>>>>> needs to be backward-compatible with the Bison available in Ubuntu
>>>>> 18.04 (3.0.4), but also work on the latest Bison that our project's
>>>>> Mac users get supplied from Homebrew (3.7.1).
>>>> Back in the days, people were *shipping* the generated files.  That
>>>> was awesome, since then maintainers are free from such constraints:
>>>> they use whatever version of their favorite generator is, and are
>>>> free from requiring anything from the user; users don't even need
>>>> to have the generator (Bison in the present case).
>>>> It's a pity today we lost this wisdom.
>>>
>>> Back in the day, people retained the generated files because
>>> the C language had started to become portable, whereas to get
>>> C from a Yacc grammar, they still had to upload their code
>>> to a Unix box to run the proprietary Yacc program.
>>>
>>> Even the person who wrote the program didn't necessarily have
>>> consistent Unix access, not to mention anyone friends to whom
>>> that person might give the code.
>>>
>>> People would upload just their .y file to a Unix system, run
>>> yacc and then download the y.tab.[ch] files.
>>>
>>> The only valid reasons for having any generated files in version
>>> control or distribution is unavailability of the tool.
>>
>> You are vastly simplifying things.  In particular, you completely
>> discard the problems with evolutions here.
>
> I see simplifying things as my job, really.
>
>>> Bison's user is whoever runs Bison. Bison's user is not that one
>>> who runs the program built with Bison; that is the user's user.
>>> The user's user is not your user.
>>>
>>> You cannot assume that your user is just a middleman in
>>> a delivery chain, who can deal with any nuisance that lands his way,
>>> because it's his job. That user may be a free software developer,
>>> like you.
>>
>> You are misunderstanding my point.  My point is that back in the
>> days people were shipping releases, and releases are self-contained,
>> they protect the end user from any non standard dependency such
>> as Autoconf, Automake, Bison, Flex, Gperf, Libtool, Gettext, just
>> to name a few of them close the GNU project.  Installing a release
>> was super easy, because you hardly had any dependency.
>
> Well, Autoconf without question! If the end user of a program is
> required to have Autoconf, then the developer has misunderstood the
> meaning of Autoconf, which is to generate a configure script that
> assumes little about the environment.
>
> I think that Autoconf and Automake are so thoroughly baked into
> the "DNA" of Bison, that you may be losing touch with the idea that
> a parser generator is not Autoconf.
>
> Look, Bison's tree makes more use of M4 macros than anything I remember
> ever seeing. It's not just for configuration but elements like
> parser skeletons and test cases. It's kind of weird!
>
> In the Unix world, Yacc is standard. You can rely on it for building
> your program as surely as you can rely on make, awk, sed, or the
> shell.
>
> In the GNU/Linux world, Bison is standard. You will never run into
> a sitution where you don't have Bison, if you rely on Bison
> extensions.
>
> Note that even Autoconf doesn't prevent the user from not requiring
> a shell or make. Those are going to be whatever the user has.
> The compiler is going to be whatever the user has.
>
>> Maintainers and contributors had a way more complex task: setting
>> up a *developer*  environment with all the required versions of the
>> required tools.  And they had to keep their environment fresh.  On
>> occasions it meant using non released versions of these tools.  But
>> that was not a problem, because it was only on the shoulders of a
>> few experienced people.
>>
>> Way too often today people no longer make self-contained releases,
>> and releases are hardly different from a git snapshot.  That is
>> wrong.
>
> Nope; what is wrong is thinking there is a difference.
>
> If you're distributing source code, then the user must have tools
> to build it.
>
> These should be exactly the same as what is required to work on
> the program.
>
> Any differences are confusing and annoying, and create barriers to
> entry to the project.
>
> "Oh, you think you've built Foobar and can create a patch for it?
> Hahahaha, you tarball-sucking fool. Let me introduce you to the
> git repository of Foobar, and the seven-headed development
> environment bootstrap process."
>
>> This is wrong because now end users need to install tons of tools.
>
> Good incentive too keep that tool count down, right?
>
> If you think the user will hate installing a ton of tools, what
> makes you think the contributing user won't hate it?
>
>> And most of them don't want to install recent versions of these tools
>> (and I don't blame them), they just want to use the one provided by
>> their distro.
>
> Are you the same person who reminded me not to use GCC-specific
> warning options in a patch, because Bison builds with many C compilers?
>
> :)
>
>> So today, some maintainers locked themselves into not being able
>> to use tools that are no widespread enough.
>
>> Not to mention that they
>> might even have to deal with different behaviors from different
>> versions of the tool.  Then they find convenient to blame the
>> evolution of the tool.
>
> I'd love to see you maintain Bison stoically, without complaining,
> if different versions of your compiler (or other tools) were
> producing different results.
>
> We have detailed requirements for those things, and international
> standards, for good reasons.
>
>> But the problem is rather their use of the generator.  *They* are
>> in charge of generating say Bison parsers, and to pass them in
>> their releases.  That's a mild effort, but with a huge ROI: you
>> no longer, ever, have to face the nightmare of having to support
>> very different versions of the tool, and you also can *immediately*
>> benefit from new features.
>
> What do you do if you've maintained a program for over a decade,
> and none of your old commits have a copy of the generated parser?
>
> When you do a "git bisect", the old builds have to build!
>
> The C parts of the old builds build because you wrote portable
> code, and the C compiler people take it seriously.
>
>> I know of several projects, some very important ones, that are
>> stuck with old versions of Bison although they could benefit from
>
> Are they really stuck with old versions of Bison, though?
>
> That only happens if their code actually doesn't work with newer Bison.
>
> *Using* only the features provided by Ubuntu 18's packaged Bison
> is not the same as being *stuck*.
>
> I'm using Ubuntu 18's Bison myself, but I'm not stuck. It looks
> like my stuff works with 3.7.
>
>> newer features, features that have sometimes been written *for them*,
>> to simplify *their* problems.  But they still have the old hackish
>> code because the recent releases of Bison are not available "yet"
>> in Ubuntu 18.04...  Gee.
>
> They will pick that up in due time; that's their decision.
>
> Using the new features takes work. If the parser side of their
> stuff works fine, maybe they have higher priority items to work on.
> Making the same stuff work fine, like before, but with nicer,
> terser Bison code, requires development effort.
>
> They probably like being able to check out an old baseline
> and also have that work with whatever Bison they have.
>
> Can't Bison have improvements that are internal? As in, nothing
> changes in the input file, but the output is better?
>
> Suppose I want the user to benefit from the newest, shiniest Bison
> they can get their hands on.
>
> The following is the actual situation.
>
> My code works with old Bison, as far back as 2.x.
>
> The Bison I'm using is behind; it's the Ubuntu one.
>
> But, I have downstream packagers who are on Bison 3.7.
>
> Maybe that puts out better code. Maybe tables are compressed better,
> or the skeleton has some new tricks to run faster or whatever.
> Maybe some buffer overflow has been fixed somewhere. I have no idea.
>
> Just because I'm not using new syntax doesn't mean I'm not
> using new Bison. Just like just because I'm using C99
> or C90 doesn't mean I'm not getting better code generation
> or diagnostics.
>
> Why would I ship frozen parser output? Why recommend that to me?
>
> The downstream packagers have chosen Bison 3.7 for their distro,
> and expect all programs to use that Bison.
>
> If there is some security issue found in Bison-generated code,
> they expect to be able to upgrade Bison and rebuild all
> packages that name it as a dependency.
>
> Shipping a frozen parser is downright antisocial.
>
> Today, the consumers of the free software developer's code base
> are downstream packagers. They have the whole suite of tools, by
> definition.
>
> The users who run the program get binaries from the packagers;
> they need no tools.
>
> If they do need tools, their packagers have them all,
> in package form, so they can be almost instantly as well-tooled
> as the distro itself.
>
> The imaginary user with the "medium amount of tools" went
> extinct in the 1990's.
>
> The modern user has all the tools. He or she just doesn't have the
> latest
> version of all of them, necessarily.
>
>>> The consumers of programming languages
>>> are programmers. Yet, we broadly value stability of programming
>>> languages. Multiple implementations that adhere to common standards
>>> are also a boon.
>>
>> True, but moot.  There's one Bison.
>
> There is one C#. So Microsoft should just break all C# code
> written before 2014.
>
> The byte-code is environment-independent; users with old
> code should just compile it with the old C# compiler and retain
> the byte code.
>
>> The right approach is rather to see how your need is part of general
>> pattern, and how that need can be fulfilled in a clean way.
>>
>> But don't, say, happily sed the generated output, and expect it to
>> work forever.
>
> If Bison has a test case representing a usage, then that will continue
> to work. If a decision is made that it will not continue to work, then
> that decision will appear in the form of a commit which removes that
> test case, which leaves a very clear record. The users relying on it
> break, but if they look at the history of the tool, they will see that
> it's not by accident, and just have to suck it up. The tool's project
> decided to drop their use case, recorded in a commit, and that is
> all there is to it.
>
> sed-ing the output is a poor approach, which was made necessary due to
> not having a test case in the parser generator to check the behavior.
>
> Well, not exactly necessary. A fix was necessary, and there are always
> alternative solutions to seding the output.
>
> However, seding the output is (sometimes) the simplest solution which
> has the virtue of having the highest probability of easily backporting
> to old baselines, building which requires the fix.
>
> If you go back with "git bisect", that seding very easy to apply.
> Even if there is a conflict in that Makefile rule (rare), it can be
> added by hand. A refactoring of the code may not backport as easiy.
>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]