[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: msgunfmt | msgfmt produces different .mo
From: |
Bruno Haible |
Subject: |
Re: msgunfmt | msgfmt produces different .mo |
Date: |
Thu, 17 Aug 2006 14:11:38 +0200 |
User-agent: |
KMail/1.9.1 |
Egmont Koblinger wrote:
> Is an msgunfmt followed by an msgfmt supposed to create a functionally
> equivalent .mo file?
Yes, it is. This is the way users are supposed to tweak few translations
if they don't have the entire source package with the complete po/ directory.
> However, I found a case where it is not true.
>
> Try "dd" (coreutils-5.97) with "hu_HU" (or probably any other non-English)
> locale. It works perfectly. Then do this:
> cd /usr/share/locale/hu/LC_MESSAGES
> msgunfmt coreutils.mo | msgfmt -o coreutils2.mo -
> mv coreutils2.mo coreutils.mo
>
> Try "dd" again in Hungarian. The "1+0 records in/out" are printed in
> English, and then it segfaults.
>
> The bug is somehow caused by the %<PRIuMAX> magic which I don't yet
> completely understand. However, I noticed that the "c-format" specifier is
> required in the .po files for these strings to work correctly. Originally it
> is there in coreutils' source, but msgunfmt doesn't put these in the newly
> re-created .po file. As a result, when it is formatted to a .mo again, it
> will be different.
>
> Looking at the raw .mo files, the correct version contains a single % sign
> at the translation of %<PRIuMAX>, while in the result of unformatting and
> formatting again (i.e. dropping the c-format keyword) the resulted .mo file
> contains %<PRIuMAX> in the translated messages.
>
> I think msgunfmt should put that "#, c-format" in the generated .po file
> where it is necessary, so that an msgunfmt followed by an msgfmt always
> produces a .mo file that behaves the same way.
>
> Tested with gettext 0.14.5 and 0.15.
You are completely right with your analysis. Find attached a fix.
The "magic" behind %<PRIuMAX> is conceptually quite simple: Since PRIuMAX
is system dependent, the PO file contains "<PRIuMAX>" as a placeholder.
In the .mo file, strings with such a placeholder are transformed into
segmented strings that are resolved at runtime, when the .mo file is mmaped
into memory. When you do
$ strings coreutils.mo | grep 'truncating at'
truncating at % bytes in output file %s
you happen to see two segments "truncating at %" and
" bytes in output file %s" that happen to lie contiguously in memory; the
placeholder is encoded as an index in a table that you don't see here.
Bruno
*** gettext-0.15/gettext-tools/src/read-mo.c.bak 2005-10-02
03:42:41.000000000 +0200
--- gettext-0.15/gettext-tools/src/read-mo.c 2006-08-17 02:09:51.000000000
+0200
***************
*** 1,5 ****
/* Reading binary .mo files.
! Copyright (C) 1995-1998, 2000-2005 Free Software Foundation, Inc.
Written by Ulrich Drepper <address@hidden>, April 1995.
This program is free software; you can redistribute it and/or modify
--- 1,5 ----
/* Reading binary .mo files.
! Copyright (C) 1995-1998, 2000-2006 Free Software Foundation, Inc.
Written by Ulrich Drepper <address@hidden>, April 1995.
This program is free software; you can redistribute it and/or modify
***************
*** 24,31 ****
--- 24,33 ----
#include "read-mo.h"
#include <errno.h>
+ #include <stdbool.h>
#include <stdio.h>
#include <stddef.h>
+ #include <stdlib.h>
#include <string.h>
/* This include file describes the main part of binary .mo format. */
***************
*** 36,41 ****
--- 38,44 ----
#include "binary-io.h"
#include "exit.h"
#include "message.h"
+ #include "format.h"
#include "gettext.h"
#define _(str) gettext (str)
***************
*** 349,354 ****
--- 352,358 ----
char *msgstr;
size_t msgstr_len;
nls_uint32 offset;
+ size_t f;
/* Read the msgctxt and msgid. */
offset = get_uint32 (&bf, header.orig_sysdep_tab_offset + i * 4);
***************
*** 377,382 ****
--- 381,446 ----
: NULL),
msgstr, msgstr_len,
&pos);
+
+ /* Only messages with c-format or objc-format annotation are
+ recognized as having system-dependent strings by msgfmt.
+ Which one of the two, we don't know. We have to guess,
+ assuming that c-format is more probable than objc-format and
+ that the .mo was likely produced by "msgfmt -c". */
+ for (f = format_c; ; f = format_objc)
+ {
+ bool valid = true;
+ struct formatstring_parser *parser = formatstring_parsers[f];
+ const char *str_end;
+ const char *str;
+
+ str_end = msgid + msgid_len;
+ for (str = msgid; str < str_end; str += strlen (str) + 1)
+ {
+ char *invalid_reason = NULL;
+ void *descr = parser->parse (str, false, &invalid_reason);
+
+ if (descr != NULL)
+ parser->free (descr);
+ else
+ {
+ free (invalid_reason);
+ valid = false;
+ break;
+ }
+ }
+ if (valid)
+ {
+ str_end = msgstr + msgstr_len;
+ for (str = msgstr; str < str_end; str += strlen (str) + 1)
+ {
+ char *invalid_reason = NULL;
+ void *descr =
+ parser->parse (str, true, &invalid_reason);
+
+ if (descr != NULL)
+ parser->free (descr);
+ else
+ {
+ free (invalid_reason);
+ valid = false;
+ break;
+ }
+ }
+ }
+
+ if (valid)
+ {
+ /* Found the most likely among c-format, objc-format. */
+ mp->is_format[f] = yes;
+ break;
+ }
+
+ /* Try next f. */
+ if (f == format_objc)
+ break;
+ }
+
message_list_append (mlp, mp);
}
break;