bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: msgunfmt | msgfmt produces different .mo


From: Bruno Haible
Subject: Re: msgunfmt | msgfmt produces different .mo
Date: Thu, 17 Aug 2006 14:11:38 +0200
User-agent: KMail/1.9.1

Egmont Koblinger wrote:
> Is an msgunfmt followed by an msgfmt supposed to create a functionally
> equivalent .mo file?

Yes, it is. This is the way users are supposed to tweak few translations
if they don't have the entire source package with the complete po/ directory.

> However, I found a case where it is not true.
> 
> Try "dd" (coreutils-5.97) with "hu_HU" (or probably any other non-English)
> locale. It works perfectly. Then do this:
> cd /usr/share/locale/hu/LC_MESSAGES
> msgunfmt coreutils.mo | msgfmt -o coreutils2.mo -
> mv coreutils2.mo coreutils.mo
> 
> Try "dd" again in Hungarian. The "1+0 records in/out" are printed in
> English, and then it segfaults.
> 
> The bug is somehow caused by the %<PRIuMAX> magic which I don't yet
> completely understand. However, I noticed that the "c-format" specifier is
> required in the .po files for these strings to work correctly. Originally it
> is there in coreutils' source, but msgunfmt doesn't put these in the newly
> re-created .po file. As a result, when it is formatted to a .mo again, it
> will be different.
> 
> Looking at the raw .mo files, the correct version contains a single % sign
> at the translation of %<PRIuMAX>, while in the result of unformatting and
> formatting again (i.e. dropping the c-format keyword) the resulted .mo file
> contains %<PRIuMAX> in the translated messages.
> 
> I think msgunfmt should put that "#, c-format" in the generated .po file
> where it is necessary, so that an msgunfmt followed by an msgfmt always
> produces a .mo file that behaves the same way.
> 
> Tested with gettext 0.14.5 and 0.15.

You are completely right with your analysis. Find attached a fix.

The "magic" behind %<PRIuMAX> is conceptually quite simple: Since PRIuMAX
is system dependent, the PO file contains "<PRIuMAX>" as a placeholder.
In the .mo file, strings with such a placeholder are transformed into
segmented strings that are resolved at runtime, when the .mo file is mmaped
into memory. When you do

  $ strings coreutils.mo | grep 'truncating at'
  truncating at % bytes in output file %s

you happen to see two segments "truncating at %" and
" bytes in output file %s" that happen to lie contiguously in memory; the
placeholder is encoded as an index in a table that you don't see here.

Bruno


*** gettext-0.15/gettext-tools/src/read-mo.c.bak        2005-10-02 
03:42:41.000000000 +0200
--- gettext-0.15/gettext-tools/src/read-mo.c    2006-08-17 02:09:51.000000000 
+0200
***************
*** 1,5 ****
  /* Reading binary .mo files.
!    Copyright (C) 1995-1998, 2000-2005 Free Software Foundation, Inc.
     Written by Ulrich Drepper <address@hidden>, April 1995.
  
     This program is free software; you can redistribute it and/or modify
--- 1,5 ----
  /* Reading binary .mo files.
!    Copyright (C) 1995-1998, 2000-2006 Free Software Foundation, Inc.
     Written by Ulrich Drepper <address@hidden>, April 1995.
  
     This program is free software; you can redistribute it and/or modify
***************
*** 24,31 ****
--- 24,33 ----
  #include "read-mo.h"
  
  #include <errno.h>
+ #include <stdbool.h>
  #include <stdio.h>
  #include <stddef.h>
+ #include <stdlib.h>
  #include <string.h>
  
  /* This include file describes the main part of binary .mo format.  */
***************
*** 36,41 ****
--- 38,44 ----
  #include "binary-io.h"
  #include "exit.h"
  #include "message.h"
+ #include "format.h"
  #include "gettext.h"
  
  #define _(str) gettext (str)
***************
*** 349,354 ****
--- 352,358 ----
              char *msgstr;
              size_t msgstr_len;
              nls_uint32 offset;
+             size_t f;
  
              /* Read the msgctxt and msgid.  */
              offset = get_uint32 (&bf, header.orig_sysdep_tab_offset + i * 4);
***************
*** 377,382 ****
--- 381,446 ----
                                   : NULL),
                                  msgstr, msgstr_len,
                                  &pos);
+ 
+             /* Only messages with c-format or objc-format annotation are
+                recognized as having system-dependent strings by msgfmt.
+                Which one of the two, we don't know.  We have to guess,
+                assuming that c-format is more probable than objc-format and
+                that the .mo was likely produced by "msgfmt -c".  */
+             for (f = format_c; ; f = format_objc)
+               {
+                 bool valid = true;
+                 struct formatstring_parser *parser = formatstring_parsers[f];
+                 const char *str_end;
+                 const char *str;
+ 
+                 str_end = msgid + msgid_len;
+                 for (str = msgid; str < str_end; str += strlen (str) + 1)
+                   {
+                     char *invalid_reason = NULL;
+                     void *descr = parser->parse (str, false, &invalid_reason);
+ 
+                     if (descr != NULL)
+                       parser->free (descr);
+                     else
+                       {
+                         free (invalid_reason);
+                         valid = false;
+                         break;
+                       }
+                   }
+                 if (valid)
+                   {
+                     str_end = msgstr + msgstr_len;
+                     for (str = msgstr; str < str_end; str += strlen (str) + 1)
+                       {
+                         char *invalid_reason = NULL;
+                         void *descr =
+                           parser->parse (str, true, &invalid_reason);
+ 
+                         if (descr != NULL)
+                           parser->free (descr);
+                         else
+                           {
+                             free (invalid_reason);
+                             valid = false;
+                             break;
+                           }
+                       }
+                   }
+ 
+                 if (valid)
+                   {
+                     /* Found the most likely among c-format, objc-format.  */
+                     mp->is_format[f] = yes;
+                     break;
+                   }
+ 
+                 /* Try next f.  */
+                 if (f == format_objc)
+                   break;
+               }
+ 
              message_list_append (mlp, mp);
            }
          break;




reply via email to

[Prev in Thread] Current Thread [Next in Thread]