bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: OpenBSD locale system


From: Ingo Schwarze
Subject: Re: OpenBSD locale system
Date: Mon, 17 Dec 2018 00:16:16 +0100
User-agent: Mutt/1.8.0 (2017-02-23)

Hi Bruno,

Bruno Haible wrote on Sun, Dec 16, 2018 at 08:01:04PM +0100:
> Ingo Schwarze wrote:

>> The OpenBSD C library intentionally doesn't implement any other
>> locale(1) categories except LC_CTYPE because many here regard the
>> other categories as overengineering and as detrimental to system
>> security

> I partially agree with this, regarding specific categories, such as
> 
>   - LC_MONETARY: The main API function for this category, strfmon(),
>     is defined in such a way that, if implemented correctly, it
>     produces misleading results.
>     <http://austingroupbugs.net/view.php?id=1199>
> 
>   - LC_PAPER: Any software which wants to print something should
>     better ask the attached printer, rather than make assumptions
>     about the printer device based on the locale.
> 
> However, locale categories such as LC_NUMERIC and LC_MESSAGES
> are useful when you assume that your software does have end-users
> that are not sysadmins.

Probably, you are right that LC_MESSAGES is not dangerous as long
as the C library doesn't actually attempt to translate system
error strings.  But LC_NUMERIC is certainly dangerous, it can
break parsers in subtle and surprising ways, whereas it doesn't
really matter all that much for end users in the first place.

But i guess discussing such considerations in detail would be
off-topic on this mailing list; i merely mentioned them to provide
minimal context regarding why certain decisions were made; so let's
focus on the consequences of the decisions, how gnulib should best
deal with them, and possibly identify parts that might need revisiting,
see below.

[...]
> Regarding OpenBSD, the uselocale support is useful for adding a checkmark
> to the checkbox "We support POSIX locale_t API", but is not useful, for
> example, to have a multithreaded web server honor the Accept-Language
> settings given by a browser user, other than by reimplementing all
> needed locale-dependent behaviour.

The "all needed" in this sentence sounds like it were a big deal;
but all that is needed here is storing one language code per user,
right?  Why would any programmer call a library API for that rather
than simply storing the selected language in a variable?

For comparison, the point of using {set,new,use}locale(3) with
LC_CTYPE is not merely remembering which character set the user
asked for, but also changing the behaviour of many *wc*(3) and
*mb*(3) library functions.  LC_MESSAGES, on the other hand, will
never have any effect on the behaviour of any library function
in the OpenBSD libc.

Also, in your web server example, you certainly don't want syslog
messages in languages requested by clients, so calling uselocale(3)
would merely be asking for trouble...  (Of course it's still possible
to write correct code, but harder.)

>> POSIX does not require that "de_DE.UTF-8" and
>> "fr_FR.UTF-8" must be different locales, or that they behave
>> differently from each other in any way.

> Here you need to distinguish
>   - locale-dependent behaviour defined by POSIX functions and
>   - locale-dependent behaviour defined by the application.
> 
> In setlocale.c you made this distinction, as witnessed by the
> comment in
> https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/locale/setlocale.c?annotate=1.29
> lines 72..75.

Actually, originally i proposed to delete that behaviour for
consistency with {new,use}locale(3), but no consensus was reached
on that point - some argued: given that it is already implemented,
why not simply keep it in setlocale(3)?  It may be useful in some
situations.  So it was kept.

But i consider setlocale(3) the odd one out here rather than
{new,use}locale(3), because setlocale(3) supports storing a string
in the library that the application program could just as easily,
or arguably even more easily, store itself.

> Why not also for the per-thread locales? By implementing the FreeBSD
> querylocale API (the equivalent of setlocale(category,NULL) for locale_t
> objects), you would make it possible for applications to pull out
> German versus French messages, depending whether the per-thread locale
> is "de_DE.UTF-8" or "fr_FR.UTF-8".

So, you suggest to store this string in the library (where it has
no effect) even though POSIX does not define a method to retrieve
it again once it is stored?  I don't quite see yet how that might
be useful - not even for your webserver example, because the webserver
couldn't portably retrieve the string, or could it?


I hoped to understand better what your point is by looking at the
HEAD of the master branch of the git repo of GNU grep because you
mentioned a test failure there - but grepping the grep repo, i can't
even seem to find any usage of newlocale(3) or setlocale(3) in
there, so i'm not quite sure what you are actually trying to
achieve.  Also, you mentioned "a test failure of test-localename",
but "grep -RF localename *" returns nohing for me in the grep repo
either...

I also tried running the build myself in order to reproduce your
issue on OpenBSD-current.  Here are the findings:
 1. ./bootstrap appears to run wget(1), unconditionally, which didn't
    exist on my system.  On OpenBSD, the program for that purpose
    is called ftp(1) - even for https:// URIs.
 2. make check yields only two failures:
XFAIL: equiv-classes
XFAIL: triple-backref
============================================================================
Testsuite summary for GNU grep 3.1.51-e767
============================================================================
# TOTAL: 109
# PASS:  80
# SKIP:  27
# XFAIL: 2
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================
============================================================================
Testsuite summary for GNU grep 3.1.51-e767
============================================================================
# TOTAL: 173
# PASS:  157
# SKIP:  16
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================

In particular, i see:
PASS: test-localename

Do you need more info?  If so, what exactly?  Better on or off list?


Of course, asking for querylocale(3) support - as opposed to
questioning the implementation of uselocale(3) - would be a rather
different matter.  But while i did hear from porters that the lack
of {new,use}locale(3) and the related interfaces did cause porting
trouble in the past, i didn't hear about trouble that would go away
by implementing querylocale(3) so far, and given that it isn't
standardized, that doesn't seem very surprising.  Of course, i may
simply have missed such trouble.

Anyway, in case what you really ask for is implementing querylocale(3),
then i no longer understand what is broken about {new,use}locale(3)
as long as querylocale(3) does not exist, so why exactly it needs
to be marked as non-working...

Yours,
  Ingo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]