POSIX msgfmt and universal-character-name escape sequences

bug-gettext

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

POSIX msgfmt and universal-character-name escape sequences

From:	Bruno Haible
Subject:	POSIX msgfmt and universal-character-name escape sequences
Date:	Thu, 23 Jun 2022 08:01:27 +0200

https://posix.rhansen.org/p/gettext_draft
Line 1031

"except that universal-character-name escape sequences need not be supported."

Neither GNU msgfmt nor Solaris msgfmt treat universal-character-name
escape sequences specially. If an msgstr contains e.g. "\\u20AC", the
resulting string in the .mo file is
{ '\\', 'u', '2', '0', 'A', 'C', '\0' }.

Issue: Leaving it undefined whether \u escape sequences are recognized can
lead to mutual incompatibility of msgfmt implementations: Implementations
would differ in their interpretation of the dot-po file.

There is no good reason for leaving it undefined: There is already a
mechanism for specifying an encoding (charset=... in the header), and the
UTF-8 encoding is in widespread use for more than 10 years.

Suggestion: Change
"except that universal-character-name escape sequences need not be supported."
to
"except that universal-character-name escape sequences are not supported."

[Prev in Thread]

Current Thread

[Next in Thread]

POSIX msgfmt and universal-character-name escape sequences, Bruno Haible <=

Prev by Date: [bug #61249] Deprecate xgettext's -s/--sort-output
Next by Date: POSIX msgfmt and escape sequences in msgid and msgid_plural strings
Previous by thread: [bug #61249] Deprecate xgettext's -s/--sort-output
Next by thread: POSIX msgfmt and escape sequences in msgid and msgid_plural strings
Index(es):
- Date
- Thread