[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnuastrocommits] master e91a4677: Library (type.h): integer strings fo
From: 
Mohammad Akhlaghi 
Subject: 
[gnuastrocommits] master e91a4677: Library (type.h): integer strings forceparsed as base 10 
Date: 
Tue, 5 Jul 2022 12:54:05 0400 (EDT) 
branch: master
commit e91a46777007abde80a85698eff95875b6480373
Author: Mohammad Akhlaghi <mohammad@akhlaghi.org>
Commit: Mohammad Akhlaghi <mohammad@akhlaghi.org>
Library (type.h): integer strings forceparsed as base 10
Until now, when parsing integer strings as integers with 'strtol', the base
was set to '0' (which means that integers starting with '0' should be
interpreted as base 8 and those starting with '0x' should be parsed as
base16). While bases, are very common in programming and computer science,
they are rarely used in data analysis. In fact, some pipelines do '0'
padding to make sure that the number has a certain number of digits (for
example '0015').
We had already accounted for this when parsing text tables (in 'lib/txt.c')
by setting the base of 'strtol' to 10. But in 'gal_type_string_to_number'
the base was '0'. As a result, when reading FITS keyword values with
integers starting with '0', the program would either crash (because '8' or
'9' are not defined in base8), or return unreasonable values since the
base was different.
With this commit, 'strtol' now uses base 10 and this fixed the problem.
I also noticed that this function was causing a crash when the integer
number couldn't be read from the string. In such cases, it should return
NULL. So that has also been corrected.
Furthermore, the documenation of 'gal_type_string_to_number' in the book
now fully describes the problem of implicit type conversion for binary
operators and the shorter description in the comments above
'arithmetic_binary_int_sanity_check' have been removed and point the reader
to that part of the book.
This bug was found by Manuel SánchezBenavente.
This fixes bug #62710.

NEWS  3 +++
doc/announceacknowledge.txt  1 +
doc/gnuastro.texi  40 +++++++++++++++++++++++++++++++++++
lib/arithmetic.c  23 ++++
lib/type.c  16 +++++++++
5 files changed, 52 insertions(+), 31 deletions()
diff git a/NEWS b/NEWS
index ae27629c..bf925fe9 100644
 a/NEWS
+++ b/NEWS
@@ 214,6 +214,9 @@ See the end of the file for license conditions.
pixels. Found by Raul InfanteSainz.
bug #62694: Radial profile script overestimates the surface brightness
error. Found and fixed with the help of Raul InfanteSainz.
+ bug #62710: Plaintext integers starting with 0 are read in the octal
+ base (which is common in software engineering, but not in
+ data analysis). Found by Manuel SánchezBenavente.
diff git a/doc/announceacknowledge.txt b/doc/announceacknowledge.txt
index c07f0795..fc2e42b2 100644
 a/doc/announceacknowledge.txt
+++ b/doc/announceacknowledge.txt
@@ 9,6 +9,7 @@ Jeremy Lim
Juan Miro
Irene Pintos Castro
Ignacio Ruiz Cejudo
+Manuel SánchezBenavente
Elham Saremi
diff git a/doc/gnuastro.texi b/doc/gnuastro.texi
index 2b25463c..5330d8a2 100644
 a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ 27903,13 +27903,43 @@ When you need to read many numbers into an array,
@code{out} would be an array,
Read @code{string} into smallest type that can host the number, the allocated
space for the number will be returned and the type of the number will be put
into the memory that @code{type} points to.
If @code{string} couldn't be read as a number, this function will return
@code{NULL}.
This function calls the C library's @code{strtod} function to read
@code{string} as a doubleprecision floating point number.
When successful, it will check the value to put it in the smallest numerical
data type that can handle it.
However, if @code{string} is successfully parsed as a number @emph{and} there
is @code{.} in @code{string}, it will force the number into floating point
types.
+This function first calls the C library's @code{strtod} function to read
@code{string} as a doubleprecision floating point number.
+When successful, it will check the value to put it in the smallest numerical
data type that can handle it: for example @code{120} and @code{50000} will be
read as a signed 8bit integer and unsigned 16bit integer types.
+When reading as an integer, the C library's @code{strtol} function is used (in
base10) to parse the string again.
+This reparsing as an integer is necessary because integers with many digits
(for example the unix epoch seconds) will not be accurately stored as a
floating point and we can't use the result of @code{strtod}.
+
+When @code{string} is successfully parsed as a number @emph{and} there is
@code{.} in @code{string}, it will force the number into floating point types.
For example @code{"5"} is read as an integer, while @code{"5."} or
@code{"5.0"}, or @code{"5.00"} will be read as a floating point
(singleprecision).
For the ranges acceptable by each type see @ref{Numeric data types}.
For integers, the range is clear, but for floating point types, this function
will count the number of significant digits and determine if the given string
is single or double precision as described in that section.
+For floating point types, this function will count the number of significant
digits and determine if the given string is single or double precision as
described in @ref{Numeric data types}.
+
+For integers, negative numbers will always be placed in signed types (as
expected).
+If a positive integer falls below the maximum of a signed type of a certain
width, it will be signed (for example @code{10} and @code{150} will be defined
as a signed and unsigned 8bit integer respectively).
+In other words, even though @code{10} can be unsigned, it will be read as a
signed 8bit integer.
+This is done to respect the C implicit type conversion in binary operators,
where signed integers will be interpreted as unsigned, when the otehr operand
is an unsigned integer of the same width.
+
+For example, see the short program below.
+It will print @code{50 is larger than 100000} (which is wrong!).
+This happens because when a negative number is parsed as an unsigned, the
value is effectively subtracted from the maximum and @mymath{429496729550} is
indeed larger than 100000 (recall that @mymath{4294967295} is the largest
unsigned 32bit integer, see @ref{Numeric data types}).
+
+@example
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+
+int
+main(void)
+@{
+ int32_t a=50;
+ uint32_t b=100000;
+ printf("%d is %s than %d\n", a,
+ a>b ? "larger" : "less or equal", b);
+ return 0;
+@}
+@end example
+
+However, if we read 100000 as a signed 32bit integer, there won't be any
problem and the printed sentence will be logically correct (for someone who
doesn't know anything about numeric data types: users of your programs).
+For the advantages of integers, see @ref{Integer benefits and pitfalls}.
@end deftypefun
@node Pointers, Library blank values, Library data types, Gnuastro library
diff git a/lib/arithmetic.c b/lib/arithmetic.c
index 7f7e17ac..4500e5d9 100644
 a/lib/arithmetic.c
+++ b/lib/arithmetic.c
@@ 2112,25 +2112,10 @@ arithmetic_binary_out_type(int operator, gal_data_t *l,
gal_data_t *r)
/* Binary arithmetic's type checks: According to C's automatic type
 conversion in binary operators, the unsigned types have higher
 precedence for the same width. As a result, something like the following
 will prove correct (the value after 'check:' will be '1').

 #include <stdio.h>
 #include <stdlib.h>
 #include <stdint.h>

 int
 main(void)
 {
 int32_t a=50;
 uint32_t b=10000;

 uint8_t o=a>b;
 printf("check: %u\n", o);
 return 0;
 }
+/* Binary arithmetic's type checks: According to C's implicity/automatic
+ type conversion in binary operators, the unsigned types have higher
+ precedence for the same width. See the description of
+ 'gal_type_string_to_number' in the Gnuastro book for an example.
To avoid this situation, it is therefore necessary to print a message
and let the user know that strange situations like above may occur. Just
diff git a/lib/type.c b/lib/type.c
index 25563a7e..8f1d5086 100644
 a/lib/type.c
+++ b/lib/type.c
@@ 595,8 +595,14 @@ gal_type_string_to_number(char *string, uint8_t *type)
can only preserve a certain number of decimals precisely after a
certain number of decimals, they loose precision. 2) Integer
comparisons (that are done below) are faster, but this is
 secondary because the parsing itself takes more time! */
 l=strtol(string, &tailptr, 0);
+ secondary because the parsing itself takes more time!
+
+ The string is being parsed as an integer in base10 (third
+ argument of 'strtol'). This should not be '0', otherwise 'strtol'
+ will parse strings starting with '0' in octal radix and strings
+ starting in '0x' in hexagesimal radix (these aren't used in data
+ analysis, only in computer science). */
+ l=strtol(string, &tailptr, 10);
if(*tailptr!='\0')
{
/* If 'tailptr' is simply an 'e', the input string was in
@@ 606,11 +612,7 @@ gal_type_string_to_number(char *string, uint8_t *type)
'ceil(d)==d' we have confirmed that the number is actually an
integer (and not a float). */
if(*tailptr=='e') l=d;
 else
 error(EXIT_FAILURE, 0, "%s: a bug! Please contact us at "
 "%s to fix this problem. The string '%s' couldn't "
 "be parsed with 'strtol', but was parsed by 'strtod'",
 __func__, PACKAGE_BUGREPORT, string);
+ else return NULL;
}
/* If the number is negative, put it in the signed types (based on
[Prev in Thread] 
Current Thread 
[Next in Thread] 
 [gnuastrocommits] master e91a4677: Library (type.h): integer strings forceparsed as base 10,
Mohammad Akhlaghi <=