help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Which lexer do people use?


From: Christian Schoenebeck
Subject: Re: Which lexer do people use?
Date: Sat, 04 Jul 2020 12:46:54 +0200

On Samstag, 4. Juli 2020 08:14:46 CEST Akim Demaille wrote:
> Hi Daniele,
> 
> > Le 3 juil. 2020 à 23:15, Daniele Nicolodi <daniele@grinta.net> a écrit :
> > 
> > Hello,
> > 
> > the historical pairing is using Flex with Bison. However, while Bison is
> > under active development and seems to be a very solid code base, there
> > isn't much activity on the Flex side https://github.com/westes/flex and
> > Flex codebase and capabilities show their age.
> 
> Yes.  I have a couple of issues opened over there, and it takes for ages
> to get them processed.  When they are.
> 
> When I tried to modernize the Flex doc about Bison, they even managed to
> turn this into a lecture about software maintenance.  And not install
> my changes.
> 
> https://github.com/westes/flex/pull/420

Well, just a difference in philosophies. Looks indeed somewhat awkward though 
that they kept criticising Bison documentation while not responding on the 
actual Flex issue at all.

In a perfect world, yes, it might have been desirable to have old Bison 
constructs still in the Bison docs today, clearly marked in red color as 
'removed in version x, replaced by y in version z', but that's IMO a purely 
thoretical issue, as everybody can clearly see that Akim is always patiently 
answering anybodys questions over here for instance.

For me, the exaggerated 'divide and conquer' philosophy applied decades ago by 
splitting scanner and parser was a much more painful decision with clearly 
perceivable, negative consequences in real world for all users.

> > I recently became aware of RE/flex https://www.genivia.com/reflex.html
> > which seems very promising. However, it only generates a C++ scanner
> > which may be (I haven't tried) to retro-fit into existing C projects to,
> > for example, gain full unicode (in its utf8 encoded form) support.
> 
> It seems amazing.  Featurewise and performancewise.  I did not know it
> (nor did I know ugrep).
> 
> I've seen projects use ragel (http://www.colm.net/open-source/ragel/)
> and re2c (https://re2c.org).  But, sadly, I have first-hand experience
> with Flex only, I can't comment about the others.
> 
> > Has anyone tried to hammer a C++ scanner peg generated by RE/flex into a
> > C grammar hole generated by Bison?
> > 
> > Which other scanners do people use?
> 
> Fine question.  I'm eager to read the answers!

AFAICS almost nobody is using anything else than Flex. Probably because its 
designated task of handling type-3 grammars is already fully covered by just 
having a correct RegEx implementation, and most of the examples, howtos, books 
and docs out there are based on Flex.

The only thing that people are missing once in a while on scanner side is 
unicode support, but there are ways to circumvent that, as you barely need 
unicode in the actual RegEx patterns. So unicode characters are usually 
somewhere between a (non unicode) start and end pattern.

The obvious real improvement in future will be finally getting rid of a 
separate scanner for good in the first place, combining the two things which 
actually belonged together from day one: having the scanner functionality 
directly in Bison instead, and saying goodbye to all those scanner state stack 
hacks which often end up in a huge mess that people can hardly read, and often 
lead to severe misbehaviours on edge cases of certain inputs.

Akim, was there any progress in the IP discussion for that to become possible 
one day or is that previously discussed merge off the table?

Best regards,
Christian Schoenebeck






reply via email to

[Prev in Thread] Current Thread [Next in Thread]