[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gettext] picking strings to translate from a program's output

From: Bruno Haible
Subject: Re: [bug-gettext] picking strings to translate from a program's output
Date: Thu, 02 May 2019 14:00:59 +0200
User-agent: KMail/5.1.3 (Linux/4.4.0-141-generic; KDE/5.18.0; x86_64; ; )

Hi Egmont,

> > 1. The gettext() function is overridden/modified to produce a string with
> >    an escape sequence that contains an URL that specifies the PO file and
> >    msgid.
> A. By doing this, the length of the string changes (significantly).
> This means that if a utilitty does a wcswidth() (or even worse:
> strlen()) to measure the width occupied in the terminal and perform
> alignment/indentation (e.g. table layout; padding to the terminal's
> right edge...) based on that, it won't work properly.
> Maybe wcswidth() can also be overridden to skip such escape sequences

You're right. It would be necessary to override wcswidth. This can be
done in the same way as overriding gettext(), dgettext() etc. Namely,
on glibc systems, through preloadable_libintl.so, and on other systems
directly in libintl.so.

> B. Care has to be taken that input sanitization (e.g. removal of
> escape sequneces) happens before applying the hyperlink. E.g. a
> pseudo-code like
>     printf(_("Cannot remove file %s\n"), filename);
> might mess up the terminal if filename contains escape sequences.

This case can be ignored. File names with escape sequences don't occur
_that_ frequently. Therefore they only need to be considered when
thinking about security and malicious attackers. But here, only
translators will set the environment variable(s) that activate the

> C. By the strings becoming noticeably larger, there's a risk that you
> trigger a buffer overflow at some temporary fix-sized buffer.

Yes. Teaching developers to not use fixed-size buffers in a process
that is ongoing for more than 30 years. (This is already mentioned in
the GNU Coding Standards for that long.) Such bugs will fall back onto
the developers, and they will fix them.

> D. If a gettext'ed string is embedded in another gettext'ed string
> (which is probably a bad practice, but sure happens sometimes), the
> hyperlink isn't restored for the trailing segment of the string. This
> is because of a difference between HTML's DOM tree model vs. the
> terminal emulator's state machine: a terminating OSC 8 doesn't restore
> the previous value but switches to non-hyperlink. Example:
>     printf(_("Give me a %s please"), foo ? _("apple") : _("banana"));

To cope with such examples, I would give translators the advice to click
on the first word of a message they want to translate.

There still is the case of

      printf (_("%s: %s"), program_name, _("invalid argument"));

but the effect will be that the "%s: %s" string is not translated until the
translator comes to the "linear" translation pass. This is not dramatic.

> We might think about extending the protocol to have push/pop

Yes, push/pop semantics may be desirable also for other applications.
For the gettext / translation use-case, push/pop would be a plus, but
is not strictly needed.

> E. The entire approach is unsuitable if you have any existing screen
> handling library in place, such as ncurses, slang, newt... or some
> manual screen handling code implemented by the app.

Yes. Such programs would be treated like programs which have a GUI.

> Use of this idea
> is pretty much limited to apps that produce output on their
> stdout/stderr

Yes, but this is a large class of programs. Whereas there are fewer
programs that use ncurses.

> Do you have concrete terminal-based apps in your mind that you'd
> prefer to be translatable this way?

bash, binutils, bison, clisp, coreutils, cpio, diffutils, findutils,
flex, gawk, gcal, gcc, gettext, grep, guix, hello, m4, make, recutils,
sed, tar, util-linux, wdiff, wget - to name just the most important

> I'm wondering if it's really worth
> it to build up the said infrastructure (with webservers etc.) to
> provide an alternate workflow compared to the "linear" approach plus
> testing.

The infrastructure is not that big. The web server is pretty trivial.

> Is there a sufficiently large set of tools to be translated +
> translators willing to give this new workflow a try? I really don't
> know.

I imagine that this approach to translation is motivating for users
who are not part of a translation team so far. But we'll see. Without
trying it, we'll never know.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]