gnuastro-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnuastro-commits] master e91a4677: Library (type.h): integer strings fo


From: Mohammad Akhlaghi
Subject: [gnuastro-commits] master e91a4677: Library (type.h): integer strings force-parsed as base 10
Date: Tue, 5 Jul 2022 12:54:05 -0400 (EDT)

branch: master
commit e91a46777007abde80a85698eff95875b6480373
Author: Mohammad Akhlaghi <mohammad@akhlaghi.org>
Commit: Mohammad Akhlaghi <mohammad@akhlaghi.org>

    Library (type.h): integer strings force-parsed as base 10
    
    Until now, when parsing integer strings as integers with 'strtol', the base
    was set to '0' (which means that integers starting with '0' should be
    interpreted as base 8 and those starting with '0x' should be parsed as
    base-16). While bases, are very common in programming and computer science,
    they are rarely used in data analysis. In fact, some pipelines do '0'
    padding to make sure that the number has a certain number of digits (for
    example '0015').
    
    We had already accounted for this when parsing text tables (in 'lib/txt.c')
    by setting the base of 'strtol' to 10. But in 'gal_type_string_to_number'
    the base was '0'. As a result, when reading FITS keyword values with
    integers starting with '0', the program would either crash (because '8' or
    '9' are not defined in base-8), or return unreasonable values since the
    base was different.
    
    With this commit, 'strtol' now uses base 10 and this fixed the problem.
    
    I also noticed that this function was causing a crash when the integer
    number couldn't be read from the string. In such cases, it should return
    NULL. So that has also been corrected.
    
    Furthermore, the documenation of 'gal_type_string_to_number' in the book
    now fully describes the problem of implicit type conversion for binary
    operators and the shorter description in the comments above
    'arithmetic_binary_int_sanity_check' have been removed and point the reader
    to that part of the book.
    
    This bug was found by Manuel Sánchez-Benavente.
    
    This fixes bug #62710.
---
 NEWS                         |  3 +++
 doc/announce-acknowledge.txt |  1 +
 doc/gnuastro.texi            | 40 +++++++++++++++++++++++++++++++++++-----
 lib/arithmetic.c             | 23 ++++-------------------
 lib/type.c                   | 16 +++++++++-------
 5 files changed, 52 insertions(+), 31 deletions(-)

diff --git a/NEWS b/NEWS
index ae27629c..bf925fe9 100644
--- a/NEWS
+++ b/NEWS
@@ -214,6 +214,9 @@ See the end of the file for license conditions.
               pixels. Found by Raul Infante-Sainz.
   bug #62694: Radial profile script overestimates the surface brightness
               error. Found and fixed with the help of Raul Infante-Sainz.
+  bug #62710: Plain-text integers starting with 0 are read in the octal
+              base (which is common in software engineering, but not in
+              data analysis). Found by Manuel Sánchez-Benavente.
 
 
 
diff --git a/doc/announce-acknowledge.txt b/doc/announce-acknowledge.txt
index c07f0795..fc2e42b2 100644
--- a/doc/announce-acknowledge.txt
+++ b/doc/announce-acknowledge.txt
@@ -9,6 +9,7 @@ Jeremy Lim
 Juan Miro
 Irene Pintos Castro
 Ignacio Ruiz Cejudo
+Manuel Sánchez-Benavente
 Elham Saremi
 
 
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index 2b25463c..5330d8a2 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -27903,13 +27903,43 @@ When you need to read many numbers into an array, 
@code{out} would be an array,
 Read @code{string} into smallest type that can host the number, the allocated 
space for the number will be returned and the type of the number will be put 
into the memory that @code{type} points to.
 If @code{string} couldn't be read as a number, this function will return 
@code{NULL}.
 
-This function calls the C library's @code{strtod} function to read 
@code{string} as a double-precision floating point number.
-When successful, it will check the value to put it in the smallest numerical 
data type that can handle it.
-However, if @code{string} is successfully parsed as a number @emph{and} there 
is @code{.} in @code{string}, it will force the number into floating point 
types.
+This function first calls the C library's @code{strtod} function to read 
@code{string} as a double-precision floating point number.
+When successful, it will check the value to put it in the smallest numerical 
data type that can handle it: for example @code{120} and @code{50000} will be 
read as a signed 8-bit integer and unsigned 16-bit integer types.
+When reading as an integer, the C library's @code{strtol} function is used (in 
base-10) to parse the string again.
+This re-parsing as an integer is necessary because integers with many digits 
(for example the unix epoch seconds) will not be accurately stored as a 
floating point and we can't use the result of @code{strtod}.
+
+When @code{string} is successfully parsed as a number @emph{and} there is 
@code{.} in @code{string}, it will force the number into floating point types.
 For example @code{"5"} is read as an integer, while @code{"5."} or 
@code{"5.0"}, or @code{"5.00"} will be read as a floating point 
(single-precision).
 
-For the ranges acceptable by each type see @ref{Numeric data types}.
-For integers, the range is clear, but for floating point types, this function 
will count the number of significant digits and determine if the given string 
is single or double precision as described in that section.
+For floating point types, this function will count the number of significant 
digits and determine if the given string is single or double precision as 
described in @ref{Numeric data types}.
+
+For integers, negative numbers will always be placed in signed types (as 
expected).
+If a positive integer falls below the maximum of a signed type of a certain 
width, it will be signed (for example @code{10} and @code{150} will be defined 
as a signed and unsigned 8-bit integer respectively).
+In other words, even though @code{10} can be unsigned, it will be read as a 
signed 8-bit integer.
+This is done to respect the C implicit type conversion in binary operators, 
where signed integers will be interpreted as unsigned, when the otehr operand 
is an unsigned integer of the same width.
+
+For example, see the short program below.
+It will print @code{-50 is larger than 100000} (which is wrong!).
+This happens because when a negative number is parsed as an unsigned, the 
value is effectively subtracted from the maximum and @mymath{4294967295-50} is 
indeed larger than 100000 (recall that @mymath{4294967295} is the largest 
unsigned 32-bit integer, see @ref{Numeric data types}).
+
+@example
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+
+int
+main(void)
+@{
+  int32_t  a=-50;
+  uint32_t b=100000;
+  printf("%d is %s than %d\n", a,
+         a>b ? "larger" : "less or equal", b);
+  return 0;
+@}
+@end example
+
+However, if we read 100000 as a signed 32-bit integer, there won't be any 
problem and the printed sentence will be logically correct (for someone who 
doesn't know anything about numeric data types: users of your programs).
+For the advantages of integers, see @ref{Integer benefits and pitfalls}.
 @end deftypefun
 
 @node Pointers, Library blank values, Library data types, Gnuastro library
diff --git a/lib/arithmetic.c b/lib/arithmetic.c
index 7f7e17ac..4500e5d9 100644
--- a/lib/arithmetic.c
+++ b/lib/arithmetic.c
@@ -2112,25 +2112,10 @@ arithmetic_binary_out_type(int operator, gal_data_t *l, 
gal_data_t *r)
 
 
 
-/* Binary arithmetic's type checks: According to C's automatic type
-   conversion in binary operators, the unsigned types have higher
-   precedence for the same width. As a result, something like the following
-   will prove correct (the value after 'check:' will be '1').
-
-        #include <stdio.h>
-        #include <stdlib.h>
-        #include <stdint.h>
-
-        int
-        main(void)
-        {
-          int32_t a=-50;
-          uint32_t b=10000;
-
-          uint8_t o=a>b;
-          printf("check: %u\n", o);
-          return 0;
-        }
+/* Binary arithmetic's type checks: According to C's implicity/automatic
+   type conversion in binary operators, the unsigned types have higher
+   precedence for the same width. See the description of
+   'gal_type_string_to_number' in the Gnuastro book for an example.
 
    To avoid this situation, it is therefore necessary to print a message
    and let the user know that strange situations like above may occur. Just
diff --git a/lib/type.c b/lib/type.c
index 25563a7e..8f1d5086 100644
--- a/lib/type.c
+++ b/lib/type.c
@@ -595,8 +595,14 @@ gal_type_string_to_number(char *string, uint8_t *type)
          can only preserve a certain number of decimals precisely after a
          certain number of decimals, they loose precision. 2) Integer
          comparisons (that are done below) are faster, but this is
-         secondary because the parsing itself takes more time! */
-      l=strtol(string, &tailptr, 0);
+         secondary because the parsing itself takes more time!
+
+         The string is being parsed as an integer in base-10 (third
+         argument of 'strtol'). This should not be '0', otherwise 'strtol'
+         will parse strings starting with '0' in octal radix and strings
+         starting in '0x' in hexagesimal radix (these aren't used in data
+         analysis, only in computer science). */
+      l=strtol(string, &tailptr, 10);
       if(*tailptr!='\0')
         {
           /* If 'tailptr' is simply an 'e', the input string was in
@@ -606,11 +612,7 @@ gal_type_string_to_number(char *string, uint8_t *type)
              'ceil(d)==d' we have confirmed that the number is actually an
              integer (and not a float). */
           if(*tailptr=='e') l=d;
-          else
-            error(EXIT_FAILURE, 0, "%s: a bug! Please contact us at "
-                  "%s to fix this problem. The string '%s' couldn't "
-                  "be parsed with 'strtol', but was parsed by 'strtod'",
-                  __func__, PACKAGE_BUGREPORT, string);
+          else return NULL;
         }
 
       /* If the number is negative, put it in the signed types (based on



reply via email to

[Prev in Thread] Current Thread [Next in Thread]