[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Request for modification of msgfmt.c and format-c.c
From: |
ishiawa |
Subject: |
Request for modification of msgfmt.c and format-c.c |
Date: |
Fri, 1 Aug 2003 00:37:51 +0900 (JST) |
Request for modification of msgfmt.c and format-c.c
Hi,
Currently, I am trying to get the positional format specifier support
in GCC's internal diagnostic routines.
These routines are invoked to emit warning/error messages
when GCC compilers encouter problems in source code they compile.
These routines don't use C's printf, sprintf, snprintf, etc.
directly, but implements their own C-sprintf like formating and
the messages are printed subsequently.
In the proces, I found out the mismatch problem of
msgfmt and use of extended format character sets by
the internal diagnostic routines of GCC.
msgfmt only understands the standard C printf format characters
whereas internal diagnostic routines of GCC use a few additional
chracters for formating.
There are many warning/error messages that use such extended
format characters in GCC source files.
Since msgfmt doesn't understand these
characters, checking of *.po files will fail if
we let such lines get checked as c-format message.
I suspect that as such extended format characters were introduced into
GCC, messages in GCC *.po lines that contain such extended format
characters were no longer marked as c-format and ignored by msgfmt
checking.
This is wrong and undesirable.
(There are many untranslated/fuzzy messages in various *.po files
and I suspect one reason is this lack of msgfmt checking of
the GCC's internally extended format character set.)
I created a patch incorporate the support for extended format
character set used by the internal GCC diagnostic routines so that
msgfmt would check the c-format-like messages assuming
the extented format character set.
This added feature is invoked as follows
msgfmt --gcc --check
or
msgfmt -G --check
now checks for the format string consistency and understands such
extended character set.
It may not be perfect, but I believe this patch
is in the right direction.
One other thing.
msgfmt didn't print out warning/error if msgid
contains unrecognized format character, and skips
the checking altogether. This is undesirable IMHO.
So I added the error output for this error case.
---
The following is a check against modified ja.po of GCC.
I modified a few lines in ja.po so that
the extended format character set is used in
messages marked as c-format string.
Without "--gcc" flag, the modified msgfmt signals error.
LC_ALL=C
address@hidden export LC_ALL
address@hidden /usr/local/bin/msgfmt --check ja.po
ja.po:14525: 'msgid' is not a valid C format string. Reason: In the directive
number 1, the character 'T' is not a valid conversion specifier.
ja.po:14539: 'msgid' is not a valid C format string. Reason: In the directive
number 1, the character 'D' is not a valid conversion specifier.
ja.po:14548: 'msgid' is not a valid C format string. Reason: In the directive
number 1, the character 'T' is not a valid conversion specifier.
ja.po:14557: 'msgid' is not a valid C format string. Reason: In the directive
number 1, the character 'T' is not a valid conversion specifier.
ja.po:14761: 'msgid' is not a valid C format string. Reason: In the directive
number 1, the character 'D' is not a valid conversion specifier.
ja.po:14766: 'msgid' is not a valid C format string. Reason: In the directive
number 1, the character ' ' is not a valid conversion specifier.
/usr/local/bin/msgfmt: found 6 fatal errors
The last error message above is a very subtle case where "%L " is a
valid internal GCC diagnostic format extension. However, XPG printf
spec seems to think "%L" is a prefix to "d" and other numerical
specifier and so when a ' ' is found, msgfmt prints error
mentioning ' '.
With "--gcc" flag, the modified msgfmt checks it
assuming the extended format character set, and
it produced no errors. Good.
address@hidden /usr/local/bin/msgfmt --check --gcc ja.po
I checked various error combinations and
modified msgfmt seems to detect them
very well.
PATCH follows.
Index: format-c.c
===================================================================
RCS file: /cvs/gettext/gettext/gettext-tools/src/format-c.c,v
retrieving revision 1.2
diff -c -3 -p -r1.2 format-c.c
*** format-c.c 24 Feb 2003 10:54:10 -0000 1.2
--- format-c.c 31 Jul 2003 15:22:45 -0000
*************** enum format_arg_type
*** 132,138 ****
| FAT_SIZE_FAST8_T | FAT_SIZE_FAST16_T
| FAT_SIZE_FAST32_T | FAT_SIZE_FAST64_T
| FAT_SIZE_INTMAX_T | FAT_SIZE_INTPTR_T
! | FAT_SIZE_SIZE_T | FAT_SIZE_PTRDIFF_T)
};
struct numbered_arg
--- 132,174 ----
| FAT_SIZE_FAST8_T | FAT_SIZE_FAST16_T
| FAT_SIZE_FAST32_T | FAT_SIZE_FAST64_T
| FAT_SIZE_INTMAX_T | FAT_SIZE_INTPTR_T
! | FAT_SIZE_SIZE_T | FAT_SIZE_PTRDIFF_T),
!
!
! /* extension for GCC internal diagnostic format routine. */
! /* We dont care what they are. Just be sure to treat these
! as characters to stand for opaque types.
! cp_printer() in gcc/gcc/cp/error.c
! recognize these as individual/different specifiers.
! case 'A': result = args_to_string (x_next_tree, verbose); break;
! case 'C': result = code_to_string (x_next_tcode); break;
! case 'D': result = decl_to_string (x_next_tree, verbose); break;
! case 'E': result = expr_to_string (x_next_tree); break;
! case 'F': result = fndecl_to_string (x_next_tree, verbose); break;
! case 'L': result = language_to_string (x_next_lang); break;
! case 'O': result = op_to_string (x_next_tcode); break;
! case 'P': result = parm_to_string (x_next_int); break;
! case 'Q': result = assop_to_string (x_next_tcode); break;
! case 'T': result = type_to_string (next_tree, verbose); break;
! case 'V': result = cv_to_string (next_tree, verbose); break;
!
! c-objc-common.c uses 'D', 'F', 'T', and 'E'. They are covered by above.
! toplev.c uses 'D', 'F', and 'T'.
!
! Because of the size mask, the following may not work well.
! 'L' needs special processing.
! */
!
! FAT_GCC_EXT_A = FAT_SIZE_PTRDIFF_T + 1,
! FAT_GCC_EXT_C,
! FAT_GCC_EXT_D,
! FAT_GCC_EXT_F,
! FAT_GCC_EXT_L,
! FAT_GCC_EXT_O,
! FAT_GCC_EXT_P,
! FAT_GCC_EXT_Q,
! FAT_GCC_EXT_T,
! FAT_GCC_EXT_V
};
struct numbered_arg
*************** struct spec
*** 163,168 ****
--- 199,207 ----
#define isdigit(c) ((unsigned int) ((c) - '0') < 10)
+ int gcc_warning_extended_format_check;
+
+
static int
numbered_arg_compare (const void *p1, const void *p2)
{
*************** format_parse (const char *format, char *
*** 630,635 ****
--- 669,713 ----
type |= (size & FAT_SIZE_MASK);
break;
default:
+ if(gcc_warning_extended_format_check)
+ {
+ if(format[-1] == 'L')
+ {
+ /* L is understood by XPG C format spec, too.
+ So if we ever want to use this msgfmt.c to
+ check for format extension
+ used within GCC's diagnostic extension,
+ 'L' should be followed by characters
+ not understood by above case statements.
+ For example, ' ' (space) would do.
+ I checked the ja.po for GCC and found that %L
+ is indeed followed by ' '.
+ */
+ type = FAT_GCC_EXT_L; break;
+ }
+ #define SET_GCC_EXTENDED(label,c) case label : type = FAT_GCC_EXT_##c; break;
+ switch(*format)
+ {
+ SET_GCC_EXTENDED('A',A);
+ SET_GCC_EXTENDED('C',C);
+ SET_GCC_EXTENDED('D',D);
+ SET_GCC_EXTENDED('F',F);
+ SET_GCC_EXTENDED('L',L);
+ SET_GCC_EXTENDED('O',O);
+ SET_GCC_EXTENDED('P',P);
+ SET_GCC_EXTENDED('Q',Q);
+ SET_GCC_EXTENDED('T',T);
+ SET_GCC_EXTENDED('V',V);
+
+ default:
+ goto bad; /* unknown */
+ }
+ #undef SET_GCC_EXTENDED
+ break; /* get out of the outer switch. */
+
+ }
+ bad:;
+
*invalid_reason =
(*format == '\0'
? INVALID_UNTERMINATED_DIRECTIVE ()
Index: msgfmt.c
===================================================================
RCS file: /cvs/gettext/gettext/gettext-tools/src/msgfmt.c,v
retrieving revision 1.9
diff -c -3 -p -r1.9 msgfmt.c
*** msgfmt.c 29 Apr 2003 10:12:15 -0000 1.9
--- msgfmt.c 31 Jul 2003 15:22:47 -0000
*************** static int exit_status;
*** 78,83 ****
--- 78,86 ----
/* If true include even fuzzy translations in output file. */
static bool include_all = false;
+ /* If true, do gcc diagnostic routine() extended format character check */
+ extern int gcc_warning_extended_format_check; /* in format-c.c */
+
/* Specifies name of the output file. */
static const char *output_file_name;
*************** static const struct option long_options[
*** 155,160 ****
--- 158,166 ----
{ "check-format", no_argument, NULL, CHAR_MAX + 3 },
{ "check-header", no_argument, NULL, CHAR_MAX + 4 },
{ "directory", required_argument, NULL, 'D' },
+
+ { "gcc", no_argument, NULL, 'G' },
+
{ "help", no_argument, NULL, 'h' },
{ "java", no_argument, NULL, 'j' },
{ "java2", no_argument, NULL, CHAR_MAX + 5 },
*************** main (int argc, char *argv[])
*** 214,220 ****
bindtextdomain (PACKAGE, relocate (LOCALEDIR));
textdomain (PACKAGE);
! while ((opt = getopt_long (argc, argv, "a:cCd:D:fhjl:o:Pr:vV", long_options,
NULL))
!= EOF)
switch (opt)
--- 220,226 ----
bindtextdomain (PACKAGE, relocate (LOCALEDIR));
textdomain (PACKAGE);
! while ((opt = getopt_long (argc, argv, "a:cCd:D:fGhjl:o:Pr:vV",
long_options,
NULL))
!= EOF)
switch (opt)
*************** main (int argc, char *argv[])
*** 248,253 ****
--- 254,264 ----
case 'f':
include_all = true;
break;
+
+ case 'G':
+ gcc_warning_extended_format_check = true;
+ break;
+
case 'h':
do_help = true;
break;
*************** Input file interpretation:\n"));
*** 586,591 ****
--- 597,604 ----
menu items\n"));
printf (_("\
-f, --use-fuzzy use fuzzy entries in output\n"));
+ printf (_("\
+ -G, --gcc check with GCC internal format extension.\n"));
printf ("\n");
printf (_("\
Output details:\n"));
*************** check_pair (const char *msgid,
*** 1167,1173 ****
msgid_plural == NULL,
true, pretty_msgstr))
exit_status = EXIT_FAILURE;
!
parser->free (msgstr_descr);
}
else
--- 1180,1186 ----
msgid_plural == NULL,
true, pretty_msgstr))
exit_status = EXIT_FAILURE;
!
parser->free (msgstr_descr);
}
else
*************** check_pair (const char *msgid,
*** 1188,1195 ****
parser->free (msgid_descr);
}
else
! free (invalid_reason);
! }
if (check_accelerators && msgid_plural == NULL)
/* Test 4: Check that if msgid is a menu item with a keyboard accelerator,
--- 1201,1221 ----
parser->free (msgid_descr);
}
else
! {
! /* we have forgotten to output error here?! */
! error_with_progname = false;
! error_at_line (0, 0, msgid_pos->file_name,
! msgid_pos->line_number,
! _("\
! '%s' is not a valid %s format string. Reason: %s"),
! "msgid", format_language_pretty[i],
! invalid_reason);
! error_with_progname = true;
! exit_status = EXIT_FAILURE;
!
! free (invalid_reason);
! }
! }
if (check_accelerators && msgid_plural == NULL)
/* Test 4: Check that if msgid is a menu item with a keyboard accelerator,
- Request for modification of msgfmt.c and format-c.c,
ishiawa <=