bug-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Run-time internationalized messages


From: Hans Aberg
Subject: Re: Run-time internationalized messages
Date: Sat, 3 May 2003 19:12:29 +0200

At 12:07 -0400 2003/05/03, Bruce Lilly wrote:
>> Now you evidently want a dynamic approach. One approach might be to put all
>> the default strings in character arrays, which easily can be changed at
>> runtime, if the names of the strings are known. If the strings are already
>> in M4 macros, the only thing that would be needed is a special M4 skeleton
>> file.
>
>For C, one approach for the output file is to simply use one array of
>strings, which can be accessed by an index computed from a basic
>message index and a language index. Equivalently, it could be viewed
>as a 2-D array[msg][lang].

I thought of that, but it may make messages go haywire when new strings are
added. -- One topic for future Bison is better parser diagnostics, I think.

>> As for the question of making the thing platform independent, there is no
>> such a thing with respect to output languages like C/C++. So there you are
>> left out in the cold. When I discussed it in a C++ newsgroup, the best
>> thing that people really needing this feature (as those writing WWW
>> browsers/servers and such) currently could find was to give names to each
>> character according to some encoding, and then use that. For example, using
>> Unicode:
>>    unsigned LATIN_CAPITAL_LETTER_A = 0x0041;
>>    ...
>> or
>>    #define LATIN_CAPITAL_LETTER_A 0x0041
>>    ...
>> Then use LATIN_CAPITAL_LETTER_A instead of "A". One can probably easily
>> produce such list of characters by taking down the Unicode Namelist and
>> convert to C format via a suitable small program.
>
>I didn't have anything quite so elaborate in mind.  I would imagine that
>each language would have an associated charset (e.g. us-ascii, iso-8859-x,
>utf-8).  What I did intend was that the implementation shouldn't depend
>on pulling the strings out of an external file at run time, since some
>target platforms running a parser might not have a file system as such
>(think embedded systems, cell phones, etc.).

If you truly want to ensure platform independence, the problem is that the
C/C++ string construct "..." does not guarantee any specific encoding at
all. For example, the C++ standard has something looking as though it were
Unicode strings, but the standard is written so that the compiler need not
support any Unicode implementation.

So if you really want to guarantee that, you must use something like that
above. Perhaps this should be a special GNU project.

>Aside from dealing with the output programming language issue, I can
>imagine a few others:
>
>2. API for language switching
>
>3. Where the language-switching code goes -- in each generated parser
>    file, or in a library archive a la liby.a.
>
>4. How the parser keeps track of the desired language, which will have
>    to work for pure parsers as well as for non-reentrant ones.
>
>5. Actually integrating it into the bison build process, automake, autoconf,
>    etc.

For the dynamic approach, I just thought (for a start) on the
quick-and-dirty approach, where the parser outputs a series of char*
C-string pointers. If you want a different language, set these pointers to
something else.

If the C-names of the char* strings are named individually, instead of
having a single array as you suggested above, and somebody forgets to
update the non-English pointer file, then one gets a correct error message,
but in English. With the array approach, one will get an erroneis error
message, but in the right language. Which do you prefer? :-)

If I should play along with this approach in an example, parse errors,
output at  "yyerrlab:" in the parser source decode file, start with
   "parse error, unexpected "
For the default, one might then have
   char* parse_error_unexpected = "parse error, unexpected ";
For another language, one might have
   char* pig_latin_parse_error_unexpected = "arsepay orerray, edunexpectay";
Then language switch whenever you want it takes place by setting
   parse_error_unexpected = pig_latin_parse_error_unexpected;

If you use the array method, then this would become say
   char* diagnostics[] = {...};
   char* pig_latin_diagnostics[] = {...};
An switch language by
   diagnostics = pig_latin_diagnostics;
But if diagnostics is changed and pig_latin_diagnostics not, then the
latter diagnostics in the latter language will go haywire.

This approach is simple, and can be used to keep track of the language as
well by say adding such a string "English", "Pig Latin", etc.

Minimal API, thus. :-)

I suspect that those that want to have diagnostics in different languages
will have to supply those strings themselves, as other things probably have
a higher priority to those developing Bison. I do not know though, as I
only use English myself. :-)

  Hans Aberg






reply via email to

[Prev in Thread] Current Thread [Next in Thread]