bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.


From: Erik Auerswald
Subject: Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.UTF-8)
Date: Sun, 3 Jul 2022 12:38:38 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1

Hi Tim,

On 26.06.22 13:34, Erik Auerswald wrote:
On 25.06.22 13:22, Erik Auerswald wrote:
On 25.06.22 06:00, Tim Rice wrote:
[Erik Auerswald wrote:]

I suspect the operation parsing code, but have not yet looked at it.

I've been having a look. It comes down to the function `scanner_get_token()` in op-scanner.c.

This function returns TOK_FLOAT only when it sees a digit followed by a period. TOK_COMMA is handled separately and is never considered for being part of a float.

I'd say that the current scanner code is not locale aware, but
it does not disable locale specific processing, e.g., floating
point number parsing.

IMHO this is the bug, i.e., operation parsing is neither locale
aware nor locale agnostic.  My preferred fix would be to make
operation parsing locale aware, for consistency.

As a workaround we could set the locale to "C" before calling
strtold().  This would allow "datamash bin:0.1 1" independent
of locale setting.  Since any locale where the decimal separator
is not the period ('.') does not work right now for specifying
floating point parameters to operations, this should not break
any existing GNU Datamash use cases.

I have a local implementation of this.  It does not break any
of the tests and enables specifying fractional binning bucket
sizes when using a German locale:

     $ LC_ALL=de_DE.UTF-8 ./datamash bin:0.1 1 <<<1,15
     1,1

What do you think?

I intend not to follow this approach, because it would make
actually fixing the localization bug an incompatible change.

IMHO disabling localization for the operation parser would
need to be gated via option, with the intent to keep this
option even after fixing the localization bug.

Br,
Erik



reply via email to

[Prev in Thread] Current Thread [Next in Thread]