bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.


From: Tim Rice
Subject: Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.UTF-8)
Date: Fri, 24 Jun 2022 21:36:41 +0000

Hey Erik,

while looking at the binning issues reported by Andreas Schamanek[0] I
noticed that providing floating point numbers as bin sizes does not work
when using a locale where comma (',') is used as decimal separator:

   $ echo $LC_NUMERIC
   de_DE.UTF-8

   ...

   $ echo 1,15 | datamash bin:0,1 1
   datamash: missing field for operation ‘bin’

I was having a play around with this, and (plot twist!), things work as 
expected when using LC_ALL instead of LC_NUMERIC:

```
$ datamash sum 1 <<< 1,1
datamash: invalid numeric value in line 1 field 1: '1,1'

$ LC_ALL=de_DE.utf8 datamash sum 1 <<< 1,1
1,1
```

I agree it should also work with LC_NUMERIC. So far, it is mysterious to me why it doesn't. I tried 
explicitly using `setlocale(LC_NUMERIC,"")` in the main function (where LC_ALL is set), 
but nothing seems to "stick".

Do you have any insight about what the problem might be?

I tried checking what other GNU projects do. I thought GNU Awk or GNU bc might 
point me in the right direction. In fact, it seems like they don't even respect 
LC_ALL:

```
$ awk '{printf "%f %f\n", $1, $2}' <<< "1,1 1.1"
1.000000 1.100000

$ LC_ALL=de_DE.utf8 awk '{printf "%f %f\n", $1, $2}' <<< "1,1 1.1"
1.000000 1.100000

$ LC_ALL=de_DE.utf8 bc <<< '1,1+1,1'
(standard_in) 1: syntax error
(standard_in) 1: syntax error

$ LC_ALL=de_DE.utf8 bc <<< '1.1+1.1'
2.2
```

So if we can figure this out for GNU Datamash, we may need to raise some bugs 
and submit some patches to other GNU projects too :)

~ Tim



reply via email to

[Prev in Thread] Current Thread [Next in Thread]