Re: [Dlang] Internationalisation WIP

bison-patches
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Dlang] Internationalisation WIP

From:	Staniloiu Eduard
Subject:	Re: [Dlang] Internationalisation WIP
Date:	Tue, 8 Dec 2020 16:31:33 +0200
On Tue, Dec 8, 2020 at 8:37 AM Akim Demaille <akim@lrde.epita.fr> wrote:

> Hi Adela,
>
> > Le 7 déc. 2020 à 15:56, Adela Vais <adela.vais99@gmail.com> a écrit :
> >
> > Hello,
> >
> > I have a question about Bison's (not the users') error messages, should
> > they be internationalized by default? That was my approach.
>
> Well, that depends what you mean "by default".  Yes, you don't have
> to do anything for YY_ to be used in C for instance: bison's messages
> are all wrapped into YY_ by default.  But it's up to the user to set
> Gettext up.  So the user is not _forced_ to use internationalization.
> That's "thanks" to macros:
>
> #ifndef YY_
> # if defined YYENABLE_NLS && YYENABLE_NLS
> #  if ENABLE_NLS
> #   include <libintl.h> /* INFRINGES ON USER NAME SPACE */
> #   define YY_(Msgid) dgettext ("bison-runtime", Msgid)
> #  endif
> # endif
> # ifndef YY_
> #  define YY_(Msgid) Msgid
> # endif
> #endif
> ]b4_has_translations_if([
> #ifndef N_
> # define N_(Msgid) Msgid
> #endif
> ])[
>
> So if internationalization is not available at compile time (not
> bison time), then YY_ is a noop.
>
> D users should not be forced to use gettext.  Would this be a place
> where D's introspection be an answer?


This can definitely be done with introspection.
I wrote the gist of it below:
```
static if (!is(typeof(YY_))) {
version(YYENABLE_NLS)
{
        version(ENABLE_NLS)
        {
            alias YY_ = partial!(dgettext, "bison-runtime");
        }
}
    static if (!is(typeof(YY_)))
    {
        pragma(inline, true)
        string YY_(string msg) { return msg; }
    }
}
static if (!is(typeof(N_)))
{
    pragma(inline, true)
    string N_(string msg) { return msg; }
}
```
To use YYENABLE_NLS and ENABLE_NLS, one would use the `-version` compiler
option:
`-version=YYENABLE_NLS -version=ENABLE_NLS`


> We definitely can add some
> Bison directive to enable i18n, if there's no clean way to do that
> in the D part only.
>
> Bare in mind that the test suite must pass, whether gettext is
> available or not.
>
>
> > diff --git a/data/skeletons/d.m4 b/data/skeletons/d.m4
> > index b84d4e94..489dce46 100644
> > --- a/data/skeletons/d.m4
> > +++ b/data/skeletons/d.m4
> > @@ -192,7 +192,12 @@ b4_symbol_foreach([b4_token_enum])dnl
> >  }
> >  ])
> >
> > -
> > +# b4_symbol_translate(STRING)
> > +# ---------------------------
> > +# Used by "bison" in the array of symbol names to mark those that
> > +# require translation.
> > +m4_define([b4_symbol_translate],
> > +[[_($1)]])
> >
> >  ## -------------- ##
> >  ## Symbol kinds.  ##
> > @@ -243,6 +248,12 @@ m4_define([b4_declare_symbol_enum],
> >      static immutable string[] yytname_ = @{
> >    ]b4_tname[
> >      @};
> > +]b4_has_translations_if([[
> > +    /* YYTRANSLATABLE[SYMBOL-NUM] -- Whether YY_SNAME[SYMBOL-NUM] is
> > +      internationalizable.  */
> > +    static immutable ]b4_int_type_for([b4_translatable])[[]
> yytranslatable = @{
> > +    ]b4_translatable[
> > +    @};]])[
> >
> >      /* Return YYSTR after stripping away unnecessary quotes and
> >       backslashes, so that it's suitable for yyerror.  The heuristic is
> > @@ -252,8 +263,31 @@ m4_define([b4_declare_symbol_enum],
> >      final void toString(W)(W sink) const
> >      if (isOutputRange!(W, char))
> >      {
> > -      string yystr = yytname_[yycode_];
> > -
> > +      string yystr = yytname_[yycode_];]b4_has_translations_if([[
> > +      if (yystr[0] == '"')
> > +      {
> > +        string yyr;
> > +        strip_quotes:
> > +          for (int i = 1; i < yystr.length; i++)
> > +            switch (yystr[i])
> > +              {
> > +              case '\'':
> > +              case ',':
> > +                break strip_quotes;
>
> This looks like yytnamerr, something the old skeletons have to live with
> that does not make sense in D.  In the other skeletons, this is triggered
> with "parse.error verbose" that you should not support.  Support only
> "parse.error detailed".  You should remove all this.
>
> In C, for instance, i18n is handled in yysymbol_name.
>
> static const char *
> yysymbol_name (yysymbol_kind_t yysymbol)
> {
>   static const char *const yy_sname[] =
>   {
>   ]b4_symbol_names[
>   };]b4_has_translations_if([[
>   /* YYTRANSLATABLE[SYMBOL-NUM] -- Whether YY_SNAME[SYMBOL-NUM] is
>      internationalizable.  */
>   static ]b4_int_type_for([b4_translatable])[ yytranslatable[] =
>   {
>   ]b4_translatable[
>   };
>   return (yysymbol < YYNTOKENS && yytranslatable[yysymbol]
>           ? _(yy_sname[yysymbol])  <=========================
>           : yy_sname[yysymbol]);]], [[
>   return yy_sname[yysymbol];]])[
> }
>
>
> You should really eliminate all this code, even in the other cases.
> I suggest that first you change this commit to not add this piece
> of code for i18n, and in an later commit, you remove this from the
> other cases too.
>
> > +              case '\\':
> > +                if (yystr[++i] != '\\')
> > +                  break strip_quotes;
> > +                goto default;
> > +              default:
> > +                yyr ~= yystr[i];
> > +                break;
> > +              case '"':
> > +                yyr = (yycode_ < yyntokens_ && yytranslatable[yycode_]
> > 0)
> > +                  ? _(yyr) : yyr;
> > +                put(sink, yyr);
> > +                return;
> > +              }
> > +        }]],[[
> >        if (yystr[0] == '"')
> >          {
> >          strip_quotes:
> > @@ -275,13 +309,15 @@ m4_define([b4_declare_symbol_enum],
> >                case '"':
> >                  return;
> >                }
> > -        }
> > +        }]])[
> >        else if (yystr == "$end")
> > -      {
> > -        put(sink, "end of input");
> > +      {]b4_has_translations_if([[
> > +        put(sink, _("end of input"));]],[[
> > +        put(sink, "end of input");]])[
> >          return;
>
> This is hairy, and is no longer needed at all.  Java no longer
> uses that either.  The translation of $end is provided by bison itself,
> like any other token, the skeleton should not play with it at all.
> Have a look at the other skeletons.
>
> > -      }
> > -
> > +      }]b4_has_translations_if([[
> > +      yystr = (yycode_ < yyntokens_ && yytranslatable[yycode_] > 0)
> > +            ? _(yystr) : yystr;]])[
> >        put(sink, yystr);
> >      }
> >    }
> > diff --git a/data/skeletons/lalr1.d b/data/skeletons/lalr1.d
> > index d126d499..615482d4 100644
> > --- a/data/skeletons/lalr1.d
> > +++ b/data/skeletons/lalr1.d
> > @@ -40,7 +40,28 @@ version(D_Version2) {
> >  ]b4_user_post_prologue[
> >  ]b4_percent_code_get([[imports]])[
> >  import std.format;
> > +import std.conv;
> >
> > +/**
> > + * Handle error message internationalisation
>
> Please end your comments with a period.
>
> > + */
> > +extern(C) char* dgettext(const char*, const char*);
> > +string YY_(const char* s)
> > +{
> > +  char* res = dgettext("bison-runtime", s);
> > +  return to!string(res);
>
> Why this intermediate variable?  Also, can't you declare dcgettext
> within the function itself?  I like scopes to be as small as possible.
>
> > +}
> > +]b4_has_translations_if([
> > +/**
> > + * User error message internationalisation
> > + */
> > +extern(C) char* gettext(const char*);
> > +string _(string s)
> > +{
> > +  char* res = gettext(s.ptr);
> > +  return to!string(res);
>
> Ditto.
>
> > +}
> > +])[
> >  /**
> >   * A Bison parser, automatically generated from
> <tt>]m4_bpatsubst(b4_file_name, [^"\(.*\)"$], [\1])[</tt>.
> >   *
> > @@ -704,20 +725,41 @@ m4_popdef([b4_at_dollar])])dnl
> >        immutable int argmax = 5;
> >        SymbolKind[] yyarg = new SymbolKind[argmax];
> >        int yycount = yysyntaxErrorArguments(yyctx, yyarg, argmax);
> > -      string res = "syntax error, unexpected ";
> > -      res ~= format!"%s"(yyarg[0]);
> > -      if (yycount < argmax + 1)
> > +      string[] yystr = new string[yycount];
> > +      for (int yyi = 0; yyi < yycount; yyi++)
> > +        yystr[yyi] = format!"%s"(yyarg[yyi]);
> > +      string res, yyformat;
> > +      import std.string;
> > +      switch (yycount)
> >        {
> > -        for (int yyi = 1; yyi < yycount; yyi++)
> > -        {
> > -          res ~= yyi == 1 ? ", expecting " : " or ";
> > -          res ~= format!"%s"(SymbolKind(yyarg[yyi]));
> > -        }
> > +        case  1:
> > +          yyformat = YY_("syntax error, unexpected %s");
> > +          res = format(yyformat, yystr[0]);
> > +         break;
>
> I don't see the point of using format twice: once to convert the symbol
> kinds to strings, and then to build the full error message.  How about
> not using yystr at all?
>
> > +        case  2:
> > +          yyformat = YY_("syntax error, unexpected %s, expecting %s");
> > +          res = format(yyformat, yystr[0], yystr[1]);
> > +          break;
> > +        case  3:
> > +          yyformat = YY_("syntax error, unexpected %s, expecting %s or
> %s");
> > +          res = format(yyformat, yystr[0], yystr[1], yystr[2]);
> > +          break;
> > +        case  4:
> > +          yyformat = YY_("syntax error, unexpected %s, expecting %s or
> %s or %s");
> > +          res = format(yyformat, yystr[0], yystr[1], yystr[2],
> yystr[3]);
> > +          break;
> > +        case  5:
> > +          yyformat = YY_("syntax error, unexpected %s, expecting %s or
> %s or %s or %s");
> > +          res = format(yyformat, yystr[0], yystr[1], yystr[2],
> yystr[3], yystr[4]);
> > +          break;
> > +        default:
> > +          res = YY_("syntax error");
> > +          break;
> >        }
> >        yyerror(]b4_locations_if([yyctx.getLocation(), ])[res);
> >      }]],
> >  [[simple]], [[
> > -    yyerror(]b4_locations_if([yyctx.getLocation(), ])["syntax
> error");]])[
> > +    yyerror(]b4_locations_if([yyctx.getLocation(), ])[YY_("syntax
> error"));]])[
> >    }
> >
> >  ]b4_parse_error_bmatch(
> > diff --git a/doc/bison.texi b/doc/bison.texi
> > index 959a4039..3f3a8b26 100644
> > --- a/doc/bison.texi
> > +++ b/doc/bison.texi
> > @@ -13945,6 +13945,55 @@ or nonzero, full tracing.
> >  Identify the Bison version and skeleton used to generate this parser.
> >  @end deftypecv
> >
> > +The internationalization in D is very simmilar to the one in C. The D
> > +parser uses gettext for user messages and dgettext for Bison messages.
>
> Use @code for function names.  But I'm not sure we should go into such
> details here.
>
> > +If gettext is not present on your system,
>
> Use Title Case for package names, but...
>
> > install it and re-run Bison's
> > +configuration.
>
> The manual is about an installed Bison.  Configuration and installation
> is irrelevant here.
>
> > +
> > +For enabling internationalisation, import bindtextdomain and textdomain
>
> Use @code for function names.  I feel unconformable with "For enabling"
> instead of "To enable", but I'm not a native.
>
> > +from C:
> > +
> > +@example
> > +extern(C) char* bindtextdomain(const char* domainname, const char*
> dirname);
> > +extern(C) char* textdomain(const char* domainname);
> > +@end example
> > +
> > +The main function should load the translation catalogues, similarly to
> the
> > +@file{c/bistromathic} example:
> > +
> > +@example
> > +int main()
> > +@{
> > +  import core.stdc.locale;
> > +
> > +  // Set up internationalization.
> > +  setlocale(LC_ALL, "");
> > +  // Use Bison's standard translation catalogue for error messages
> > +  // (the generated messages).
> > +  bindtextdomain("bison-runtime", BISON_LOCALEDIR);
> > +  // For the translation catalogue of your own project, use the
> > +  // name of your project.
> > +  bindtextdomain("bison", LOCALEDIR);
> > +  textdomain("bison");
> > +
> > +  // usual main content
> > +  ...
> > +@}
> > +@end example
> > +
> > +String aliases may be marked for internationalization (@pxref{Token
> > +I18n}):
>
> We should avoid repeating in each language the common bits.  I don't
> these bits are needed it, is it?
>
> > +
> > +@example
> > +%token PLUS   "+"
> > +       MINUS  "-"
> > +       STAR   "*"
> > +       SLASH  "/"
> > +       EOL    _("end of line")
> > +%token <ival> NUM _("number")
> > +@end example
> > +
> >  @node D Parser Context Interface
> >  @subsection D Parser Context Interface
> >  The parser context provides information to build error reports when you
> > diff --git a/tests/calc.at b/tests/calc.at
> > index c11ab5c3..1e343e8a 100644
> > --- a/tests/calc.at
> > +++ b/tests/calc.at
> > @@ -665,6 +665,19 @@ m4_define([_AT_DATA_CALC_Y(d)],
> >  };
> >  %printer { fprintf (yyo, "%d", $$); } <ival>;
> >
> > +%code {
> > +]AT_TOKEN_TRANSLATE_IF([[
> > +  static string _(string s)
> > +  {
> > +    if (s == "end of input")
> > +      return "end of file";
> > +    else if (s == "number")
> > +      return "nombre";
> > +    return s;
>
> I personally prefer that we maintain the "else if" chain, instead of
> relying on return to break the execution flow.  Make it more
> functional-style
> and add the last else.  Currently the style is even inconsistent, with just
> one else.
>
> And if D supports 'switch' over string, well, even better :)
>
> > +  }
> > +]])[
> > +}
> > +
> >  /* Bison Declarations */
> >  %token EOF 0 ]AT_TOKEN_TRANSLATE_IF([_("end of file")], ["end of
> input"])[
> >  %token <ival> NUM   "number"
>
> Nice work!
>
> Cheers!
>
[Prev in Thread]
Current Thread
[Next in Thread]
[Dlang] Internationalisation WIP, Adela Vais, 2020/12/01
- Re: [Dlang] Internationalisation WIP, Akim Demaille, 2020/12/03
  - Re: [Dlang] Internationalisation WIP, Adela Vais, 2020/12/07
    - Re: [Dlang] Internationalisation WIP, Akim Demaille, 2020/12/08
    - Re: [Dlang] Internationalisation WIP, Staniloiu Eduard <=
Prev by Date: Re: [Dlang] Internationalisation WIP
Next by Date: [PATCH 0/9] glr2.cc: fixes
Previous by thread: Re: [Dlang] Internationalisation WIP
Next by thread: c++: use noexcept where appropriate
Index(es):
- Date
- Thread