bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gettext] Hardcoded escaping


From: Serj Lavrin
Subject: Re: [bug-gettext] Hardcoded escaping
Date: Thu, 29 Oct 2015 12:09:24 +0200

Hi
 
I'm not sure that it's bug by it's nature, and even maybe it was intended to be so.
 
Issue related solely to the way, how gettext lexer handles escaping here: http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-tools/src/po-lex.c?id=b75a50bc5f30666fb36469a90245e1d7856d672c#n750
 
For example, this literal string:
 
```
"Hi, I\'m string with escaping"
```
 
will result during parsing in error `invalid sequence`
 
It's because in lexer all possible escaping "cases" were hardcoded, and among possible cases there is no single quote
 
In most part of other language this string will be valid, because they always return escaped value, unless it's special case, like '\n` or `\t`.
 
So, for example, this will be technically valid too, but not acceptable in gettext:
 
```
"This is just rando\m escaped string"
```
 
`\m` here should return just `m`, and we shouldn't care much for which reason developer decided to escape particularly that character.
 
 
So, why is that important at all? Since, well, we just can strip all escaping sequences in `.pot` files, except one which are accepted by gettext lexer, and that should fix issue, right?
 
Unfortunately, it isn't that simple.
 
Gettext used in many languages, and some of them might need escaping in quite unexpected places.
 
For example, let's take our node and Grunt based project, Kotsu: https://github.com/LotusTM/Kotsu
 
Sometimes we need to pass into gettext something with single quotes. And since _javascript_ (and in our case, Nunjucks — templating language) usually uses single quotes, it can became an issue:
 
```
gettext('My magic ain't working string') // for obvious reasons won't work
```
 
We need to escape that quote:
 
```
gettext('My magic ain\'t working string')
```
 
Not big deal — now that works.
 
But then comes into play `xgettext`. During extraction `xgettext` will get string as it is — with escaped single quote. And it's fine — just right what we need. But as soon as you'll appeal to lexer (for example, during upload to OneSky they checking is `.pot` file valid with gettext native lexer) — it will throw error about invalid sequence.
 
Yes, for sure we can strip escaping of single quotes and every other characters except specified in gettext lexer, and it will work. Because technically for _javascript_ and most part of other languages there is no difference between
 
```
"String hasn't been that good"
```
 
and
 
```
'String hasn\'t been that good'
```
 
Thus, actual `gettext` function will work. But we'll have to maintain slightly different strings in source code (escaped) and in `.pot` files (unescaped), which feels kinda wrong.
 
 
So, for now I see three possible options:
 
1. Instead of `invalid sequence` always return escaped character, unless it's one of specified cases
2. Change behavior of `xgettext`, so that it would extract strings without unneeded escaping. But in such case we will have different strings in source code and l10n files, which is against consistency
3. Say to me that I'm so stupid that even shouldn't try spend time on describing that non-existing issue
 
I, probably, understand that such escaping justified by C, where, I guess, something works different from popular web languages. For example, as far as I know, in C single quotes marks single character, and that's probably explain why only double quotes where marked as valid escaping sequence, since in C you will never encounter into single quote inside single quotes. But it doesn't work for other languages.
 
Thanks in advance for your time. I hope I didn't waste it.
 
And thank you so much for your contribution to open source. Live without gettext would be much harder :)
 
---
Best wishes,
Serj
 
 
 
29.10.2015, 09:40, "Daiki Ueno" <address@hidden>:

Serj Lavrin <address@hidden> writes:
 

 There seems to be an issue with hardcoded escaping in
 http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-tools/src/po-lex.c?id=b75a50bc5f30666fb36469a90245e1d7856d672c#n750

 If I will explain an issue, is there any chances that it will be fixed?


Please go ahead and describe the issue. We will try to fix it if it is
a real bug.

Regards,

--
Daiki Ueno

reply via email to

[Prev in Thread] Current Thread [Next in Thread]