Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.

From: Erik Auerswald
Subject: Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.UTF-8)
Date: Sun, 26 Jun 2022 13:34:26 +0200
Hey Tim,

On 25.06.22 13:22, Erik Auerswald wrote:
On 25.06.22 06:00, Tim Rice wrote:
[Erik Auerswald wrote:]

I suspect the operation parsing code, but have not yet looked at it.

I've been having a look. It comes down to the function `scanner_get_token()` in op-scanner.c.

This function returns TOK_FLOAT only when it sees a digit followed by a period. TOK_COMMA is handled separately and is never considered for being part of a float.

I'd say that the current scanner code is not locale aware, but
it does not disable locale specific processing, e.g., floating
point number parsing.

As a workaround we could set the locale to "C" before calling
strtold().  This would allow "datamash bin:0.1 1" independent
of locale setting.  Since any locale where the decimal separator
is not the period ('.') does not work right now for specifying
floating point parameters to operations, this should not break
any existing GNU Datamash use cases.

I have a local implementation of this.  It does not break any
of the tests and enables specifying fractional binning bucket
sizes when using a German locale:

    $ LC_ALL=de_DE.UTF-8 ./datamash bin:0.1 1 <<<1,15

What do you think?



