bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gettext] GetText escapes


From: Bruno Haible
Subject: Re: [bug-gettext] GetText escapes
Date: Sat, 17 Jun 2017 11:01:06 +0200
User-agent: KMail/5.1.3 (Linux/4.4.0-79-generic; KDE/5.18.0; x86_64; ; )

Hi,

CCing the bug-gettext mailing list. About gettext related matters, please
write to the mailing list, not to me personally.

> Hello Sir,
> 
> I'm learning to code and I am trying to write my own tool similar to
> gettext and I have some trouble understanding how gettext works with
> escaped characters.
> 
> My tool can generate a complete .pot file already but I open it in PoEdit
> it says "translations should not contains the escape sequence \r".
> 
> Now I looked into the problem and I figured that I should also escape the
>  \ to become \\.
> 
> So just to be clear my original strings I find in my program source files
> all have \r\n as line break. And the \r are needed, if they would be
> missing in the translation then my windows doesn't do line breaks right.
> (line breaks get ignored)
> 
> I thought that when a "real" string in my program to be translated contains
> the sequence '\r', '\n' (so the actual invisible characters, not the text
> representation), I would have to write "\r\n" (as text) into the .pot file.
> 
> But then the PoEdit tool tells me that those escape sequences shouldn't be
> there.
> 
> Do I have to double-escape those? So literally "\\\r\\\n"?
> But wouldn't I have to double-unescape that as well while I load a
> translated .po file in my final application?
> 
> I am really confused and I hope you can shed some light on this.
> 
> Kind Regards

You need to understand two things.

1) About the escaping of <newline> vs. '\n' vs. '\\\n'.
A "tool similar to" xgettext extracts strings from a source file
(in a source language, with escaping rules) to a PO file (with
escaping rules specified in [1]: "using " delimiters and \ escapes").
Thus you have two conversions:
  - unescape, from the source language to the internal representation,
  - escape, from the internal representation to the PO format.
The internal representation of a newline character is typically
a memory word or byte with value 10.

2) About the sequence '\r\n'.
For compatibility between text editors on Windows and text editors on
Unix, most programming languages declare '\r\n' to be a newline, just
like '\n'. Usually, '\r' doesn't appear alone in a text file.
Is this the case also with your source language? If no, it's an unwise
choice because it will produce different results on Windows than on Unix.
Consider what happens when a text file is sent by mail from a Windows
user to a Unix user or vice versa...
If yes, then your "tool similar to" xgettext need to treat '\r\n' like
'\n'.

Bruno

[1] https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html




reply via email to

[Prev in Thread] Current Thread [Next in Thread]