bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

re: Request for modification of msgfmt.c and format-c.c


From: ishiawa
Subject: re: Request for modification of msgfmt.c and format-c.c
Date: Fri, 1 Aug 2003 02:13:00 +0900 (JST)

(I modified xgettext.c as well, now.)


ADDITION:

After reading the description of gettext HTML documentation,
I realized xgettext() also needs to understand the
extended format characters that are used by GCC's internal
diagnostic routines. Otherwise, the c-format flag/option
is not produced for such format messages.

I used xgettext() on a small C file to verify my understanding.
For the following file, unmodified xgettext() failed to
procuce the c-format option line for the third error message.
--------
#include <libintl.h>
#define _(String) gettext(String) 
test()
{
  int i;
  error(_("This is a error message.\n"));
  error(_("This is a error message with standard C format specifier. %d = \n"), 
i);
  error(_("Non standard (GNU CC internal extension.) %T \n"), &i);
}
---------

So I modified xgettext() as well.
With --gcc option to xgettext(), it now produces the c-format
option for the third message.
The following patch at the end now includes the
xgettext.c patch as well.

So in a nutshell, for GCC suite message translation work,
please run xgettext with the additional --gcc flag to extract messages.
run msgfmt with the additional --gcc flag to check the 
translated messages.

New modified PATCH follows at the end.

-------

Request for modification of msgfmt.c and format-c.c

Hi, 

I am trying to get the positional format specifier support
in GCC's internal diagnostic routines.
These routines are invoked to emit warning/error messages
when GCC compilers encouter problems in source code they compile.

In the proces, I found out the problem of
msgfmt and use of extended format character sets by
the internal diagnostic routines of GCC.

msgfmt only understands the standard C printf format characters
whereas internal diagnostic routines of GCC use a few additional
chracters for formating.  
There are many warning/error messages that use such extended
format characters in GCC source files.
Since msgfmt doesn't understand these
characters, checking of *.po files will fail.  I suspect that as such
extended format characters were introduced, messages in GCC *.po lines
that contain such extended format characters were no longer marked as
c-format and ignored by msgfmt checking.

This is wrong and undesirable.
(There are many untranslated/fuzzy messages in various *.po files
and I suspect one reason is this lack of checking of 
the used format characters by msgfmt.)

I created a patch incorporate the support for extended format
character set used by the internal GCC diagnostic routines so that
msgfmt would check the c-format-like messages assuming
the extented format character set.

This added feature is invoked as follows

msgfmt --gcc --check

or

msgfmt -G --check

now checks for the format string consistency and understands such
extended character set.

It may not be perfect, but I believe this patch
is in the right direction.

One other thing.

msgfmt didn't print out warning/error if msgid
contains unrecognized format character, and skips
the checking altogether. This is undesirable IMHO.
So I added the error output for this error case.

---
The following is a check against modified ja.po of GCC.
I modified a few lines in ja.po so that
the extended format character set is used in
messages marked as c-format string.

Without "--gcc" flag, the modified msgfmt signals error.

LC_ALL=C
address@hidden export LC_ALL
address@hidden /usr/local/bin/msgfmt --check ja.po
ja.po:14525: 'msgid' is not a valid C format string. Reason: In the directive 
number 1, the character 'T' is not a valid conversion specifier.
ja.po:14539: 'msgid' is not a valid C format string. Reason: In the directive 
number 1, the character 'D' is not a valid conversion specifier.
ja.po:14548: 'msgid' is not a valid C format string. Reason: In the directive 
number 1, the character 'T' is not a valid conversion specifier.
ja.po:14557: 'msgid' is not a valid C format string. Reason: In the directive 
number 1, the character 'T' is not a valid conversion specifier.
ja.po:14761: 'msgid' is not a valid C format string. Reason: In the directive 
number 1, the character 'D' is not a valid conversion specifier.
ja.po:14766: 'msgid' is not a valid C format string. Reason: In the directive 
number 1, the character ' ' is not a valid conversion specifier.
/usr/local/bin/msgfmt: found 6 fatal errors

The last error message above is a very subtle case where "%L " is a
valid internal GCC diagnostic format extension. However, XPG printf
spec seems to think "%L" is a prefix to "d" and other numerical
specifier and so when a ' ' is found, msgfmt prints error 
mentioning ' '.

With "--gcc" flag, the modified msgfmt checks it
assuming the extended format character set, and
it produced no errors. Good.


address@hidden /usr/local/bin/msgfmt --check --gcc ja.po


I checked various error combinations and
modified msgfmt seems to detect them
very well.



Index: format-c.c
===================================================================
RCS file: /cvs/gettext/gettext/gettext-tools/src/format-c.c,v
retrieving revision 1.2
diff -c -3 -p -r1.2 format-c.c
*** format-c.c  24 Feb 2003 10:54:10 -0000      1.2
--- format-c.c  31 Jul 2003 17:05:28 -0000
***************
*** 20,25 ****
--- 20,27 ----
  # include <config.h>
  #endif
  
+ #include <stdio.h>
+ 
  #include <stdbool.h>
  #include <stdlib.h>
  
*************** enum format_arg_type
*** 132,138 ****
                           | FAT_SIZE_FAST8_T | FAT_SIZE_FAST16_T
                           | FAT_SIZE_FAST32_T | FAT_SIZE_FAST64_T
                           | FAT_SIZE_INTMAX_T | FAT_SIZE_INTPTR_T
!                          | FAT_SIZE_SIZE_T | FAT_SIZE_PTRDIFF_T)
  };
  
  struct numbered_arg
--- 134,176 ----
                           | FAT_SIZE_FAST8_T | FAT_SIZE_FAST16_T
                           | FAT_SIZE_FAST32_T | FAT_SIZE_FAST64_T
                           | FAT_SIZE_INTMAX_T | FAT_SIZE_INTPTR_T
!                          | FAT_SIZE_SIZE_T | FAT_SIZE_PTRDIFF_T),
! 
! 
!   /* extension for GCC internal diagnostic format routine. */
!   /* We dont care what they are. Just be sure to treat these
!      as characters to stand for opaque types.
!    cp_printer() in gcc/gcc/cp/error.c
!      recognize these as individual/different specifiers. 
!       case 'A': result = args_to_string (x_next_tree, verbose);       break;
!       case 'C': result = code_to_string (x_next_tcode);               break;
!       case 'D': result = decl_to_string (x_next_tree, verbose);       break;
!       case 'E': result = expr_to_string (x_next_tree);        break;
!       case 'F': result = fndecl_to_string (x_next_tree, verbose);     break;
!       case 'L': result = language_to_string (x_next_lang);          break;
!       case 'O': result = op_to_string (x_next_tcode);         break;
!       case 'P': result = parm_to_string (x_next_int);         break;
!       case 'Q': result = assop_to_string (x_next_tcode);              break;
!       case 'T': result = type_to_string (next_tree, verbose); break;
!       case 'V': result = cv_to_string (next_tree, verbose);   break;
! 
!     c-objc-common.c uses 'D', 'F', 'T', and 'E'. They are covered by above.
!     toplev.c uses 'D', 'F', and 'T'.
! 
!    Because of the  size mask, the following may not work well. 
!    'L' needs special processing.  
! */
! 
!   FAT_GCC_EXT_A = FAT_SIZE_PTRDIFF_T + 1,
!   FAT_GCC_EXT_C,
!   FAT_GCC_EXT_D,
!   FAT_GCC_EXT_F,
!   FAT_GCC_EXT_L,
!   FAT_GCC_EXT_O,
!   FAT_GCC_EXT_P,
!   FAT_GCC_EXT_Q,
!   FAT_GCC_EXT_T,
!   FAT_GCC_EXT_V
  };
  
  struct numbered_arg
*************** struct spec
*** 163,168 ****
--- 201,209 ----
  #define isdigit(c) ((unsigned int) ((c) - '0') < 10)
  
  
+ int gcc_warning_extended_format_check;
+ 
+ 
  static int
  numbered_arg_compare (const void *p1, const void *p2)
  {
*************** format_parse (const char *format, char *
*** 630,635 ****
--- 671,715 ----
                type |= (size & FAT_SIZE_MASK);
                break;
              default:
+               if(gcc_warning_extended_format_check)
+                 {
+                   if(format[-1] == 'L')
+                     {
+                       /* L is understood by XPG C format spec, too. 
+                          So if we ever want to use this msgfmt.c to
+                          check for format extension
+                          used within GCC's diagnostic extension,
+                          'L' should be followed by characters
+                          not understood by above case statements.
+                          For example, ' ' (space) would do. 
+                          I checked the ja.po for GCC and found that %L 
+                          is indeed followed by ' '.
+                       */
+                       type = FAT_GCC_EXT_L; break;
+                     }
+ #define SET_GCC_EXTENDED(label,c) case label : type = FAT_GCC_EXT_##c; break;
+                   switch(*format)
+                     {
+                       SET_GCC_EXTENDED('A',A);
+                       SET_GCC_EXTENDED('C',C);
+                       SET_GCC_EXTENDED('D',D);
+                       SET_GCC_EXTENDED('F',F);
+                       SET_GCC_EXTENDED('L',L);
+                       SET_GCC_EXTENDED('O',O);
+                       SET_GCC_EXTENDED('P',P);
+                       SET_GCC_EXTENDED('Q',Q);
+                       SET_GCC_EXTENDED('T',T);
+                       SET_GCC_EXTENDED('V',V);
+ 
+                     default:
+                       goto bad;       /* unknown  */
+                     }
+ #undef SET_GCC_EXTENDED
+                   break;              /* get out of the outer switch. */
+ 
+                 }
+             bad:;
+ 
                *invalid_reason =
                  (*format == '\0'
                   ? INVALID_UNTERMINATED_DIRECTIVE ()
Index: msgfmt.c
===================================================================
RCS file: /cvs/gettext/gettext/gettext-tools/src/msgfmt.c,v
retrieving revision 1.9
diff -c -3 -p -r1.9 msgfmt.c
*** msgfmt.c    29 Apr 2003 10:12:15 -0000      1.9
--- msgfmt.c    31 Jul 2003 17:05:30 -0000
*************** static int exit_status;
*** 78,83 ****
--- 78,86 ----
  /* If true include even fuzzy translations in output file.  */
  static bool include_all = false;
  
+ /* If true, do gcc diagnostic routine() extended format character check */
+ extern int gcc_warning_extended_format_check; /* in format-c.c */
+ 
  /* Specifies name of the output file.  */
  static const char *output_file_name;
  
*************** static const struct option long_options[
*** 155,160 ****
--- 158,166 ----
    { "check-format", no_argument, NULL, CHAR_MAX + 3 },
    { "check-header", no_argument, NULL, CHAR_MAX + 4 },
    { "directory", required_argument, NULL, 'D' },
+ 
+   { "gcc", no_argument, NULL, 'G' },
+ 
    { "help", no_argument, NULL, 'h' },
    { "java", no_argument, NULL, 'j' },
    { "java2", no_argument, NULL, CHAR_MAX + 5 },
*************** main (int argc, char *argv[])
*** 214,220 ****
    bindtextdomain (PACKAGE, relocate (LOCALEDIR));
    textdomain (PACKAGE);
  
!   while ((opt = getopt_long (argc, argv, "a:cCd:D:fhjl:o:Pr:vV", long_options,
                             NULL))
         != EOF)
      switch (opt)
--- 220,226 ----
    bindtextdomain (PACKAGE, relocate (LOCALEDIR));
    textdomain (PACKAGE);
  
!   while ((opt = getopt_long (argc, argv, "a:cCd:D:fGhjl:o:Pr:vV", 
long_options,
                             NULL))
         != EOF)
      switch (opt)
*************** main (int argc, char *argv[])
*** 248,253 ****
--- 254,264 ----
        case 'f':
        include_all = true;
        break;
+ 
+       case 'G':
+       gcc_warning_extended_format_check = true;
+       break;
+ 
        case 'h':
        do_help = true;
        break;
*************** Input file interpretation:\n"));
*** 586,591 ****
--- 597,604 ----
                                  menu items\n"));
        printf (_("\
    -f, --use-fuzzy             use fuzzy entries in output\n"));
+       printf (_("\
+   -G, --gcc                   check with GCC internal format extension.\n"));
        printf ("\n");
        printf (_("\
  Output details:\n"));
*************** check_pair (const char *msgid,
*** 1167,1173 ****
                                         msgid_plural == NULL,
                                         true, pretty_msgstr))
                        exit_status = EXIT_FAILURE;
! 
                      parser->free (msgstr_descr);
                    }
                  else
--- 1180,1186 ----
                                         msgid_plural == NULL,
                                         true, pretty_msgstr))
                        exit_status = EXIT_FAILURE;
!  
                      parser->free (msgstr_descr);
                    }
                  else
*************** check_pair (const char *msgid,
*** 1188,1195 ****
              parser->free (msgid_descr);
            }
          else
!           free (invalid_reason);
!       }
  
    if (check_accelerators && msgid_plural == NULL)
      /* Test 4: Check that if msgid is a menu item with a keyboard accelerator,
--- 1201,1221 ----
              parser->free (msgid_descr);
            }
          else
!           {
!             /* we have forgotten to output error here?! */
!             error_with_progname = false;
!             error_at_line (0, 0, msgid_pos->file_name,
!                            msgid_pos->line_number,
!                            _("\
! '%s' is not a valid %s format string. Reason: %s"),
!                            "msgid", format_language_pretty[i],
!                            invalid_reason);
!             error_with_progname = true;
!             exit_status = EXIT_FAILURE;
!  
!             free (invalid_reason);
!           }
!     }
  
    if (check_accelerators && msgid_plural == NULL)
      /* Test 4: Check that if msgid is a menu item with a keyboard accelerator,
Index: xgettext.c
===================================================================
RCS file: /cvs/gettext/gettext/gettext-tools/src/xgettext.c,v
retrieving revision 1.16
diff -c -3 -p -r1.16 xgettext.c
*** xgettext.c  27 Jun 2003 12:35:05 -0000      1.16
--- xgettext.c  31 Jul 2003 17:05:32 -0000
***************
*** 80,85 ****
--- 80,88 ----
  #include "x-glade.h"
  
  
+ /* If true, do gcc diagnostic routine() extended format character check */
+ extern int gcc_warning_extended_format_check; /* in format-c.c */
+ 
  /* If nonzero add all comments immediately preceding one of the keywords. */
  static bool add_all_comments = false;
  
*************** static const struct option long_options[
*** 159,164 ****
--- 162,168 ----
    { "force-po", no_argument, &force_po, 1 },
    { "foreign-user", no_argument, NULL, CHAR_MAX + 2 },
    { "from-code", required_argument, NULL, CHAR_MAX + 3 },
+   { "gcc", no_argument, NULL, 'G'},
    { "help", no_argument, NULL, 'h' },
    { "indent", no_argument, NULL, 'i' },
    { "join-existing", no_argument, NULL, 'j' },
*************** main (int argc, char *argv[])
*** 244,250 ****
    xgettext_global_source_encoding = po_charset_ascii;
  
    while ((optchar = getopt_long (argc, argv,
!                                "ac::Cd:D:eEf:Fhijk::l:L:m::M::no:p:sTVw:x:",
                                 long_options, NULL)) != EOF)
      switch (optchar)
        {
--- 248,254 ----
    xgettext_global_source_encoding = po_charset_ascii;
  
    while ((optchar = getopt_long (argc, argv,
!                                "ac::Cd:D:eEf:FGhijk::l:L:m::M::no:p:sTVw:x:",
                                 long_options, NULL)) != EOF)
      switch (optchar)
        {
*************** main (int argc, char *argv[])
*** 296,301 ****
--- 300,308 ----
        case 'f':
        files_from = optarg;
        break;
+       case 'G':
+       gcc_warning_extended_format_check = true;
+       break;
        case 'F':
        sort_by_filepos = true;
          break;
*************** Choice of input file language:\n"));
*** 663,668 ****
--- 670,678 ----
    -C, --c++                   shorthand for --language=C++\n"));
        printf (_("\
  By default the language is guessed depending on the input file name 
extension.\n"));
+       printf (_("\
+   -G, --gcc                   support internally used GCC diagnostic 
format\n"));
+ 
        printf ("\n");
        printf (_("\
  Input file interpretation:\n"));




reply via email to

[Prev in Thread] Current Thread [Next in Thread]