bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Request for modification of msgfmt.c and format-c.c


From: ishiawa
Subject: Request for modification of msgfmt.c and format-c.c
Date: Fri, 1 Aug 2003 00:37:51 +0900 (JST)

Request for modification of msgfmt.c and format-c.c

Hi, 

Currently, I am trying to get the positional format specifier support
in GCC's internal diagnostic routines.
These routines are invoked to emit warning/error messages
when GCC compilers encouter problems in source code they compile.
These routines don't use C's printf, sprintf, snprintf, etc.
directly, but implements their own C-sprintf like formating and
the messages are printed subsequently.

In the proces, I found out the mismatch problem of
msgfmt and use of extended format character sets by
the internal diagnostic routines of GCC.

msgfmt only understands the standard C printf format characters
whereas internal diagnostic routines of GCC use a few additional
chracters for formating.  
There are many warning/error messages that use such extended
format characters in GCC source files.
Since msgfmt doesn't understand these
characters, checking of *.po files will fail if
we let such lines get checked as c-format message.  
I suspect that as such extended format characters were introduced into
GCC, messages in GCC *.po lines that contain such extended format
characters were no longer marked as c-format and ignored by msgfmt
checking.

This is wrong and undesirable.
(There are many untranslated/fuzzy messages in various *.po files
and I suspect one reason is this lack of msgfmt checking of 
the GCC's internally extended format character set.)

I created a patch incorporate the support for extended format
character set used by the internal GCC diagnostic routines so that
msgfmt would check the c-format-like messages assuming
the extented format character set.

This added feature is invoked as follows

msgfmt --gcc --check

or

msgfmt -G --check

now checks for the format string consistency and understands such
extended character set.

It may not be perfect, but I believe this patch
is in the right direction.

One other thing.

msgfmt didn't print out warning/error if msgid
contains unrecognized format character, and skips
the checking altogether. This is undesirable IMHO.
So I added the error output for this error case.

---
The following is a check against modified ja.po of GCC.
I modified a few lines in ja.po so that
the extended format character set is used in
messages marked as c-format string.

Without "--gcc" flag, the modified msgfmt signals error.

LC_ALL=C
address@hidden export LC_ALL
address@hidden /usr/local/bin/msgfmt --check ja.po
ja.po:14525: 'msgid' is not a valid C format string. Reason: In the directive 
number 1, the character 'T' is not a valid conversion specifier.
ja.po:14539: 'msgid' is not a valid C format string. Reason: In the directive 
number 1, the character 'D' is not a valid conversion specifier.
ja.po:14548: 'msgid' is not a valid C format string. Reason: In the directive 
number 1, the character 'T' is not a valid conversion specifier.
ja.po:14557: 'msgid' is not a valid C format string. Reason: In the directive 
number 1, the character 'T' is not a valid conversion specifier.
ja.po:14761: 'msgid' is not a valid C format string. Reason: In the directive 
number 1, the character 'D' is not a valid conversion specifier.
ja.po:14766: 'msgid' is not a valid C format string. Reason: In the directive 
number 1, the character ' ' is not a valid conversion specifier.
/usr/local/bin/msgfmt: found 6 fatal errors

The last error message above is a very subtle case where "%L " is a
valid internal GCC diagnostic format extension. However, XPG printf
spec seems to think "%L" is a prefix to "d" and other numerical
specifier and so when a ' ' is found, msgfmt prints error 
mentioning ' '.

With "--gcc" flag, the modified msgfmt checks it
assuming the extended format character set, and
it produced no errors. Good.


address@hidden /usr/local/bin/msgfmt --check --gcc ja.po


I checked various error combinations and
modified msgfmt seems to detect them
very well.

PATCH follows.

Index: format-c.c
===================================================================
RCS file: /cvs/gettext/gettext/gettext-tools/src/format-c.c,v
retrieving revision 1.2
diff -c -3 -p -r1.2 format-c.c
*** format-c.c  24 Feb 2003 10:54:10 -0000      1.2
--- format-c.c  31 Jul 2003 15:22:45 -0000
*************** enum format_arg_type
*** 132,138 ****
                           | FAT_SIZE_FAST8_T | FAT_SIZE_FAST16_T
                           | FAT_SIZE_FAST32_T | FAT_SIZE_FAST64_T
                           | FAT_SIZE_INTMAX_T | FAT_SIZE_INTPTR_T
!                          | FAT_SIZE_SIZE_T | FAT_SIZE_PTRDIFF_T)
  };
  
  struct numbered_arg
--- 132,174 ----
                           | FAT_SIZE_FAST8_T | FAT_SIZE_FAST16_T
                           | FAT_SIZE_FAST32_T | FAT_SIZE_FAST64_T
                           | FAT_SIZE_INTMAX_T | FAT_SIZE_INTPTR_T
!                          | FAT_SIZE_SIZE_T | FAT_SIZE_PTRDIFF_T),
! 
! 
!   /* extension for GCC internal diagnostic format routine. */
!   /* We dont care what they are. Just be sure to treat these
!      as characters to stand for opaque types.
!    cp_printer() in gcc/gcc/cp/error.c
!      recognize these as individual/different specifiers. 
!       case 'A': result = args_to_string (x_next_tree, verbose);       break;
!       case 'C': result = code_to_string (x_next_tcode);               break;
!       case 'D': result = decl_to_string (x_next_tree, verbose);       break;
!       case 'E': result = expr_to_string (x_next_tree);        break;
!       case 'F': result = fndecl_to_string (x_next_tree, verbose);     break;
!       case 'L': result = language_to_string (x_next_lang);          break;
!       case 'O': result = op_to_string (x_next_tcode);         break;
!       case 'P': result = parm_to_string (x_next_int);         break;
!       case 'Q': result = assop_to_string (x_next_tcode);              break;
!       case 'T': result = type_to_string (next_tree, verbose); break;
!       case 'V': result = cv_to_string (next_tree, verbose);   break;
! 
!     c-objc-common.c uses 'D', 'F', 'T', and 'E'. They are covered by above.
!     toplev.c uses 'D', 'F', and 'T'.
! 
!    Because of the  size mask, the following may not work well. 
!    'L' needs special processing.  
! */
! 
!   FAT_GCC_EXT_A = FAT_SIZE_PTRDIFF_T + 1,
!   FAT_GCC_EXT_C,
!   FAT_GCC_EXT_D,
!   FAT_GCC_EXT_F,
!   FAT_GCC_EXT_L,
!   FAT_GCC_EXT_O,
!   FAT_GCC_EXT_P,
!   FAT_GCC_EXT_Q,
!   FAT_GCC_EXT_T,
!   FAT_GCC_EXT_V
  };
  
  struct numbered_arg
*************** struct spec
*** 163,168 ****
--- 199,207 ----
  #define isdigit(c) ((unsigned int) ((c) - '0') < 10)
  
  
+ int gcc_warning_extended_format_check;
+ 
+ 
  static int
  numbered_arg_compare (const void *p1, const void *p2)
  {
*************** format_parse (const char *format, char *
*** 630,635 ****
--- 669,713 ----
                type |= (size & FAT_SIZE_MASK);
                break;
              default:
+               if(gcc_warning_extended_format_check)
+                 {
+                   if(format[-1] == 'L')
+                     {
+                       /* L is understood by XPG C format spec, too. 
+                          So if we ever want to use this msgfmt.c to
+                          check for format extension
+                          used within GCC's diagnostic extension,
+                          'L' should be followed by characters
+                          not understood by above case statements.
+                          For example, ' ' (space) would do. 
+                          I checked the ja.po for GCC and found that %L 
+                          is indeed followed by ' '.
+                       */
+                       type = FAT_GCC_EXT_L; break;
+                     }
+ #define SET_GCC_EXTENDED(label,c) case label : type = FAT_GCC_EXT_##c; break;
+                   switch(*format)
+                     {
+                       SET_GCC_EXTENDED('A',A);
+                       SET_GCC_EXTENDED('C',C);
+                       SET_GCC_EXTENDED('D',D);
+                       SET_GCC_EXTENDED('F',F);
+                       SET_GCC_EXTENDED('L',L);
+                       SET_GCC_EXTENDED('O',O);
+                       SET_GCC_EXTENDED('P',P);
+                       SET_GCC_EXTENDED('Q',Q);
+                       SET_GCC_EXTENDED('T',T);
+                       SET_GCC_EXTENDED('V',V);
+ 
+                     default:
+                       goto bad;       /* unknown  */
+                     }
+ #undef SET_GCC_EXTENDED
+                   break;              /* get out of the outer switch. */
+ 
+                 }
+             bad:;
+ 
                *invalid_reason =
                  (*format == '\0'
                   ? INVALID_UNTERMINATED_DIRECTIVE ()
Index: msgfmt.c
===================================================================
RCS file: /cvs/gettext/gettext/gettext-tools/src/msgfmt.c,v
retrieving revision 1.9
diff -c -3 -p -r1.9 msgfmt.c
*** msgfmt.c    29 Apr 2003 10:12:15 -0000      1.9
--- msgfmt.c    31 Jul 2003 15:22:47 -0000
*************** static int exit_status;
*** 78,83 ****
--- 78,86 ----
  /* If true include even fuzzy translations in output file.  */
  static bool include_all = false;
  
+ /* If true, do gcc diagnostic routine() extended format character check */
+ extern int gcc_warning_extended_format_check; /* in format-c.c */
+ 
  /* Specifies name of the output file.  */
  static const char *output_file_name;
  
*************** static const struct option long_options[
*** 155,160 ****
--- 158,166 ----
    { "check-format", no_argument, NULL, CHAR_MAX + 3 },
    { "check-header", no_argument, NULL, CHAR_MAX + 4 },
    { "directory", required_argument, NULL, 'D' },
+ 
+   { "gcc", no_argument, NULL, 'G' },
+ 
    { "help", no_argument, NULL, 'h' },
    { "java", no_argument, NULL, 'j' },
    { "java2", no_argument, NULL, CHAR_MAX + 5 },
*************** main (int argc, char *argv[])
*** 214,220 ****
    bindtextdomain (PACKAGE, relocate (LOCALEDIR));
    textdomain (PACKAGE);
  
!   while ((opt = getopt_long (argc, argv, "a:cCd:D:fhjl:o:Pr:vV", long_options,
                             NULL))
         != EOF)
      switch (opt)
--- 220,226 ----
    bindtextdomain (PACKAGE, relocate (LOCALEDIR));
    textdomain (PACKAGE);
  
!   while ((opt = getopt_long (argc, argv, "a:cCd:D:fGhjl:o:Pr:vV", 
long_options,
                             NULL))
         != EOF)
      switch (opt)
*************** main (int argc, char *argv[])
*** 248,253 ****
--- 254,264 ----
        case 'f':
        include_all = true;
        break;
+ 
+       case 'G':
+       gcc_warning_extended_format_check = true;
+       break;
+ 
        case 'h':
        do_help = true;
        break;
*************** Input file interpretation:\n"));
*** 586,591 ****
--- 597,604 ----
                                  menu items\n"));
        printf (_("\
    -f, --use-fuzzy             use fuzzy entries in output\n"));
+       printf (_("\
+   -G, --gcc                   check with GCC internal format extension.\n"));
        printf ("\n");
        printf (_("\
  Output details:\n"));
*************** check_pair (const char *msgid,
*** 1167,1173 ****
                                         msgid_plural == NULL,
                                         true, pretty_msgstr))
                        exit_status = EXIT_FAILURE;
! 
                      parser->free (msgstr_descr);
                    }
                  else
--- 1180,1186 ----
                                         msgid_plural == NULL,
                                         true, pretty_msgstr))
                        exit_status = EXIT_FAILURE;
!  
                      parser->free (msgstr_descr);
                    }
                  else
*************** check_pair (const char *msgid,
*** 1188,1195 ****
              parser->free (msgid_descr);
            }
          else
!           free (invalid_reason);
!       }
  
    if (check_accelerators && msgid_plural == NULL)
      /* Test 4: Check that if msgid is a menu item with a keyboard accelerator,
--- 1201,1221 ----
              parser->free (msgid_descr);
            }
          else
!           {
!             /* we have forgotten to output error here?! */
!             error_with_progname = false;
!             error_at_line (0, 0, msgid_pos->file_name,
!                            msgid_pos->line_number,
!                            _("\
! '%s' is not a valid %s format string. Reason: %s"),
!                            "msgid", format_language_pretty[i],
!                            invalid_reason);
!             error_with_progname = true;
!             exit_status = EXIT_FAILURE;
!  
!             free (invalid_reason);
!           }
!     }
  
    if (check_accelerators && msgid_plural == NULL)
      /* Test 4: Check that if msgid is a menu item with a keyboard accelerator,




reply via email to

[Prev in Thread] Current Thread [Next in Thread]