UTF-8/Unicode Bison

bug-bison

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

UTF-8/Unicode Bison

From:	Hans Aberg
Subject:	UTF-8/Unicode Bison
Date:	Sun, 09 Jan 2005 14:40:41 +0100
User-agent:	Microsoft-Outlook-Express-Macintosh-Edition/5.0.6

There seems to be a simple way to extend Bison to Unicode. Essentially, this
embarks to give meaning to the '...' construct for Unicode characters. One
way is to treat this as a UTF-8 multibyte sequence. Bison would thus treat
this as a sequence of character tokens. Now, if the .y grammar file is
assumed to be in UTF-8, then what is needed is to give 'c1 ... ck' meaning
for a suitable character sequence, by merely translating it into the
character token sequence 'c1'...'ck'.

As for the yylex handshaking, I see two possibilities: A UTF-8 mode, where a
multibyte sequence is returned one by one, in a succession of yylex calls.
An a Unicode mode, where yylex returns the full Unicode number in UTF-32.
Bison would then start its token number at number higher than 0x10FFFF, the
highest possible Unicode number. If a Unicode number is returned by yylex,
then the Bison parser translates this into a UTF-8 sequence, which is the
processed as normal.

  Hans Aberg

[Prev in Thread]

Current Thread

[Next in Thread]

UTF-8/Unicode Bison, Hans Aberg <=

Prev by Date: Bison and Guile
Next by Date: Re: [GNU Bison 2.0] testsuite: 108 failed
Previous by thread: Bison and Guile
Next by thread: Java code from Bison?
Index(es):
- Date
- Thread