bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Dlang] Internationalisation WIP


From: Akim Demaille
Subject: Re: [Dlang] Internationalisation WIP
Date: Tue, 8 Dec 2020 07:37:08 +0100

Hi Adela,

> Le 7 déc. 2020 à 15:56, Adela Vais <adela.vais99@gmail.com> a écrit :
> 
> Hello,
> 
> I have a question about Bison's (not the users') error messages, should
> they be internationalized by default? That was my approach.

Well, that depends what you mean "by default".  Yes, you don't have
to do anything for YY_ to be used in C for instance: bison's messages
are all wrapped into YY_ by default.  But it's up to the user to set
Gettext up.  So the user is not _forced_ to use internationalization.
That's "thanks" to macros:

#ifndef YY_
# if defined YYENABLE_NLS && YYENABLE_NLS
#  if ENABLE_NLS
#   include <libintl.h> /* INFRINGES ON USER NAME SPACE */
#   define YY_(Msgid) dgettext ("bison-runtime", Msgid)
#  endif
# endif
# ifndef YY_
#  define YY_(Msgid) Msgid
# endif
#endif
]b4_has_translations_if([
#ifndef N_
# define N_(Msgid) Msgid
#endif
])[

So if internationalization is not available at compile time (not
bison time), then YY_ is a noop.

D users should not be forced to use gettext.  Would this be a place
where D's introspection be an answer?  We definitely can add some
Bison directive to enable i18n, if there's no clean way to do that
in the D part only.

Bare in mind that the test suite must pass, whether gettext is
available or not.


> diff --git a/data/skeletons/d.m4 b/data/skeletons/d.m4
> index b84d4e94..489dce46 100644
> --- a/data/skeletons/d.m4
> +++ b/data/skeletons/d.m4
> @@ -192,7 +192,12 @@ b4_symbol_foreach([b4_token_enum])dnl
>  }
>  ])
>  
> -
> +# b4_symbol_translate(STRING)
> +# ---------------------------
> +# Used by "bison" in the array of symbol names to mark those that
> +# require translation.
> +m4_define([b4_symbol_translate],
> +[[_($1)]])
>  
>  ## -------------- ##
>  ## Symbol kinds.  ##
> @@ -243,6 +248,12 @@ m4_define([b4_declare_symbol_enum],
>      static immutable string[] yytname_ = @{
>    ]b4_tname[
>      @};
> +]b4_has_translations_if([[
> +    /* YYTRANSLATABLE[SYMBOL-NUM] -- Whether YY_SNAME[SYMBOL-NUM] is
> +      internationalizable.  */
> +    static immutable ]b4_int_type_for([b4_translatable])[[] yytranslatable = 
> @{
> +    ]b4_translatable[
> +    @};]])[
>  
>      /* Return YYSTR after stripping away unnecessary quotes and
>       backslashes, so that it's suitable for yyerror.  The heuristic is
> @@ -252,8 +263,31 @@ m4_define([b4_declare_symbol_enum],
>      final void toString(W)(W sink) const
>      if (isOutputRange!(W, char))
>      {
> -      string yystr = yytname_[yycode_];
> -
> +      string yystr = yytname_[yycode_];]b4_has_translations_if([[
> +      if (yystr[0] == '"')
> +      {
> +        string yyr;
> +        strip_quotes:
> +          for (int i = 1; i < yystr.length; i++)
> +            switch (yystr[i])
> +              {
> +              case '\'':
> +              case ',':
> +                break strip_quotes;

This looks like yytnamerr, something the old skeletons have to live with
that does not make sense in D.  In the other skeletons, this is triggered
with "parse.error verbose" that you should not support.  Support only
"parse.error detailed".  You should remove all this.

In C, for instance, i18n is handled in yysymbol_name.

static const char *
yysymbol_name (yysymbol_kind_t yysymbol)
{
  static const char *const yy_sname[] =
  {
  ]b4_symbol_names[
  };]b4_has_translations_if([[
  /* YYTRANSLATABLE[SYMBOL-NUM] -- Whether YY_SNAME[SYMBOL-NUM] is
     internationalizable.  */
  static ]b4_int_type_for([b4_translatable])[ yytranslatable[] =
  {
  ]b4_translatable[
  };
  return (yysymbol < YYNTOKENS && yytranslatable[yysymbol]
          ? _(yy_sname[yysymbol])  <=========================
          : yy_sname[yysymbol]);]], [[
  return yy_sname[yysymbol];]])[
}


You should really eliminate all this code, even in the other cases.
I suggest that first you change this commit to not add this piece
of code for i18n, and in an later commit, you remove this from the
other cases too.

> +              case '\\':
> +                if (yystr[++i] != '\\')
> +                  break strip_quotes;
> +                goto default;
> +              default:
> +                yyr ~= yystr[i];
> +                break;
> +              case '"':
> +                yyr = (yycode_ < yyntokens_ && yytranslatable[yycode_] > 0)
> +                  ? _(yyr) : yyr;
> +                put(sink, yyr);
> +                return;
> +              }
> +        }]],[[
>        if (yystr[0] == '"')
>          {
>          strip_quotes:
> @@ -275,13 +309,15 @@ m4_define([b4_declare_symbol_enum],
>                case '"':
>                  return;
>                }
> -        }
> +        }]])[
>        else if (yystr == "$end")
> -      {
> -        put(sink, "end of input");
> +      {]b4_has_translations_if([[
> +        put(sink, _("end of input"));]],[[
> +        put(sink, "end of input");]])[
>          return;

This is hairy, and is no longer needed at all.  Java no longer
uses that either.  The translation of $end is provided by bison itself,
like any other token, the skeleton should not play with it at all.
Have a look at the other skeletons.

> -      }
> -
> +      }]b4_has_translations_if([[
> +      yystr = (yycode_ < yyntokens_ && yytranslatable[yycode_] > 0)
> +            ? _(yystr) : yystr;]])[
>        put(sink, yystr);
>      }
>    }
> diff --git a/data/skeletons/lalr1.d b/data/skeletons/lalr1.d
> index d126d499..615482d4 100644
> --- a/data/skeletons/lalr1.d
> +++ b/data/skeletons/lalr1.d
> @@ -40,7 +40,28 @@ version(D_Version2) {
>  ]b4_user_post_prologue[
>  ]b4_percent_code_get([[imports]])[
>  import std.format;
> +import std.conv;
>  
> +/**
> + * Handle error message internationalisation

Please end your comments with a period.

> + */
> +extern(C) char* dgettext(const char*, const char*);
> +string YY_(const char* s)
> +{
> +  char* res = dgettext("bison-runtime", s);
> +  return to!string(res);

Why this intermediate variable?  Also, can't you declare dcgettext
within the function itself?  I like scopes to be as small as possible.

> +}
> +]b4_has_translations_if([
> +/**
> + * User error message internationalisation
> + */
> +extern(C) char* gettext(const char*);
> +string _(string s)
> +{
> +  char* res = gettext(s.ptr);
> +  return to!string(res);

Ditto.

> +}
> +])[
>  /**
>   * A Bison parser, automatically generated from 
> <tt>]m4_bpatsubst(b4_file_name, [^"\(.*\)"$], [\1])[</tt>.
>   *
> @@ -704,20 +725,41 @@ m4_popdef([b4_at_dollar])])dnl
>        immutable int argmax = 5;
>        SymbolKind[] yyarg = new SymbolKind[argmax];
>        int yycount = yysyntaxErrorArguments(yyctx, yyarg, argmax);
> -      string res = "syntax error, unexpected ";
> -      res ~= format!"%s"(yyarg[0]);
> -      if (yycount < argmax + 1)
> +      string[] yystr = new string[yycount];
> +      for (int yyi = 0; yyi < yycount; yyi++)
> +        yystr[yyi] = format!"%s"(yyarg[yyi]);
> +      string res, yyformat;
> +      import std.string;
> +      switch (yycount)
>        {
> -        for (int yyi = 1; yyi < yycount; yyi++)
> -        {
> -          res ~= yyi == 1 ? ", expecting " : " or ";
> -          res ~= format!"%s"(SymbolKind(yyarg[yyi]));
> -        }
> +        case  1:
> +          yyformat = YY_("syntax error, unexpected %s");
> +          res = format(yyformat, yystr[0]);
> +         break;

I don't see the point of using format twice: once to convert the symbol
kinds to strings, and then to build the full error message.  How about
not using yystr at all?

> +        case  2:
> +          yyformat = YY_("syntax error, unexpected %s, expecting %s");
> +          res = format(yyformat, yystr[0], yystr[1]);
> +          break;
> +        case  3:
> +          yyformat = YY_("syntax error, unexpected %s, expecting %s or %s");
> +          res = format(yyformat, yystr[0], yystr[1], yystr[2]);
> +          break;
> +        case  4:
> +          yyformat = YY_("syntax error, unexpected %s, expecting %s or %s or 
> %s");
> +          res = format(yyformat, yystr[0], yystr[1], yystr[2], yystr[3]);
> +          break;
> +        case  5:
> +          yyformat = YY_("syntax error, unexpected %s, expecting %s or %s or 
> %s or %s");
> +          res = format(yyformat, yystr[0], yystr[1], yystr[2], yystr[3], 
> yystr[4]);
> +          break;
> +        default:
> +          res = YY_("syntax error");
> +          break;
>        }
>        yyerror(]b4_locations_if([yyctx.getLocation(), ])[res);
>      }]],
>  [[simple]], [[
> -    yyerror(]b4_locations_if([yyctx.getLocation(), ])["syntax error");]])[
> +    yyerror(]b4_locations_if([yyctx.getLocation(), ])[YY_("syntax 
> error"));]])[
>    }
>  
>  ]b4_parse_error_bmatch(
> diff --git a/doc/bison.texi b/doc/bison.texi
> index 959a4039..3f3a8b26 100644
> --- a/doc/bison.texi
> +++ b/doc/bison.texi
> @@ -13945,6 +13945,55 @@ or nonzero, full tracing.
>  Identify the Bison version and skeleton used to generate this parser.
>  @end deftypecv
>  
> +The internationalization in D is very simmilar to the one in C. The D
> +parser uses gettext for user messages and dgettext for Bison messages.

Use @code for function names.  But I'm not sure we should go into such
details here.

> +If gettext is not present on your system,

Use Title Case for package names, but...

> install it and re-run Bison's
> +configuration.

The manual is about an installed Bison.  Configuration and installation
is irrelevant here.

> +
> +For enabling internationalisation, import bindtextdomain and textdomain

Use @code for function names.  I feel unconformable with "For enabling"
instead of "To enable", but I'm not a native.

> +from C:
> +
> +@example
> +extern(C) char* bindtextdomain(const char* domainname, const char* dirname);
> +extern(C) char* textdomain(const char* domainname);
> +@end example
> +
> +The main function should load the translation catalogues, similarly to the
> +@file{c/bistromathic} example:
> +
> +@example
> +int main()
> +@{
> +  import core.stdc.locale;
> +
> +  // Set up internationalization.
> +  setlocale(LC_ALL, "");
> +  // Use Bison's standard translation catalogue for error messages
> +  // (the generated messages).
> +  bindtextdomain("bison-runtime", BISON_LOCALEDIR);
> +  // For the translation catalogue of your own project, use the
> +  // name of your project.
> +  bindtextdomain("bison", LOCALEDIR);
> +  textdomain("bison");
> +
> +  // usual main content
> +  ...
> +@}
> +@end example
> +
> +String aliases may be marked for internationalization (@pxref{Token
> +I18n}):

We should avoid repeating in each language the common bits.  I don't
these bits are needed it, is it?

> +
> +@example
> +%token PLUS   "+"
> +       MINUS  "-"
> +       STAR   "*"
> +       SLASH  "/"
> +       EOL    _("end of line")
> +%token <ival> NUM _("number")
> +@end example
> +
>  @node D Parser Context Interface
>  @subsection D Parser Context Interface
>  The parser context provides information to build error reports when you
> diff --git a/tests/calc.at b/tests/calc.at
> index c11ab5c3..1e343e8a 100644
> --- a/tests/calc.at
> +++ b/tests/calc.at
> @@ -665,6 +665,19 @@ m4_define([_AT_DATA_CALC_Y(d)],
>  };
>  %printer { fprintf (yyo, "%d", $$); } <ival>;
>  
> +%code {
> +]AT_TOKEN_TRANSLATE_IF([[
> +  static string _(string s)
> +  {
> +    if (s == "end of input")
> +      return "end of file";
> +    else if (s == "number")
> +      return "nombre";
> +    return s;

I personally prefer that we maintain the "else if" chain, instead of
relying on return to break the execution flow.  Make it more functional-style
and add the last else.  Currently the style is even inconsistent, with just
one else.

And if D supports 'switch' over string, well, even better :)

> +  }
> +]])[
> +}
> +
>  /* Bison Declarations */
>  %token EOF 0 ]AT_TOKEN_TRANSLATE_IF([_("end of file")], ["end of input"])[
>  %token <ival> NUM   "number"

Nice work!

Cheers!


reply via email to

[Prev in Thread] Current Thread [Next in Thread]