bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 2 issues with binning


From: Andreas Schamanek
Subject: Re: 2 issues with binning
Date: Tue, 26 Jul 2022 20:34:44 +0200 (CEST)


Hi all,

My sincere apologies for not replying sooner.

On Wed, 22 Jun 2022, at 22:14, Tim Rice wrote:

Regarding the floating point issue, it became easy to see the source of the problem when I added some printf statements. As you suspected, 4.2/0.1 ends up looking like 41.999 plus change, so the result gets gated into the 4.1 bin.

This brought to mind our discussion last month about incorporating arbitrary precision into GNU Datamash. Whereas it would be difficult to remediate such issues in time for v1.8, the plan to make AP available in v2.0 will hopefully do the job for us.

As I said I am not a programmer and hence it might not be surprising that I still do not know how binning with fractional bin widths/sizes is done reliably, i.e. avoiding floating point issues.

Increasing the precision is of course an option, it seems, but not trivially so. GNU Awk, which allows to set precision bit by bit, served me to illustrate this:

$ for p in {65..75} ; do echo -n "PREC=$p "; gawk -M -v PREC=$p \
  'BEGIN{print int(4.2/0.1)-41}' ; done

(65..75 just as an example to keep the output short.)

In Awk I tried to work around it by adding a very small value before cutting off decimals using int() but sure enough it didn't take long until I was processing data that happened to require surprisingly more precision than what my work around was prepared to handle.

--
-- Andreas

     :-)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]