Re: Which lexer do people use?

help-bison

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Which lexer do people use?

From:	Adrian Vogelsgesang
Subject:	Re: Which lexer do people use?
Date:	Sat, 4 Jul 2020 19:30:36 +0000
User-agent:	Microsoft-MacOutlook/10.10.17.200615

Hi Daniele,

> Which other scanners do people use?
For what it’s worth, we are using a hand-rolled scanner. Seemed just the 
fastest way to get rolling and the easiest to maintain.

Also, it allowed us to embed a few hacks directly inside the scanner: E.g. in a 
few places our grammar is not actually LR1. Only in very few edge cases, 
though, so that we don’t want to use GLR. Hence, our scanner does a lookahead 
and, e.g., upon encountering the token “WITH” looks at the following token. If 
the next token is “TIMESTAMP”, it produces “WITH_LA” instead of just “WITH”. 
Thereby, we get 1 look-ahead from the scanner. Combined with the 1 lookahead 
provided by bison, we can now parse our LR2 grammar.

Not sure if this would have been possible also with flex – but given we have a 
hand-rolled parser it was straightforward.

You can find a similar hack also in 
https://github.com/postgres/postgres/blob/master/src/backend/parser/gram.y#L721,
 if you look for the WITH_LA keywords. Postgres is using a flex scanner and 
then stacks a custom layer between flex and bison which introduces the 
additional maintenance overhead.

Cheers,
Adrian


From: help-bison <help-bison-bounces+avogelsgesang=tableau.com@gnu.org> on 
behalf of Daniele Nicolodi <daniele@grinta.net>
Date: Friday, 3 July 2020 at 23:15
To: Bison Help <help-bison@gnu.org>
Subject: Which lexer do people use?

Hello,

the historical pairing is using Flex with Bison. However, while Bison is
under active development and seems to be a very solid code base, there
isn't much activity on the Flex side 
https://github.com/westes/flex<https://github.com/westes/flex> and
Flex codebase and capabilities show their age.

I recently became aware of RE/flex 
https://www.genivia.com/reflex.html<https://www.genivia.com/reflex.html>
which seems very promising. However, it only generates a C++ scanner
which may be (I haven't tried) to retro-fit into existing C projects to,
for example, gain full unicode (in its utf8 encoded form) support.

Has anyone tried to hammer a C++ scanner peg generated by RE/flex into a
C grammar hole generated by Bison?

Which other scanners do people use?

Thank you.

Cheers,
Dan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Parsing a language with optional spaces, (continued)
- Re: Which lexer do people use?, Hans Åberg, 2020/07/04
- Re: Which lexer do people use?, Adrian Vogelsgesang <=
  - Re: Which lexer do people use?, Akim Demaille, 2020/07/06

Prev by Date: Re: Which lexer do people use?
Next by Date: Re: Which lexer do people use?
Previous by thread: Re: Which lexer do people use?
Next by thread: Re: Which lexer do people use?
Index(es):
- Date
- Thread