[gnuastro-commits] master 213a1976: Library (type.h): signed ints, same

gnuastro-commits
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnuastro-commits] master 213a1976: Library (type.h): signed ints, same

From:	Mohammad Akhlaghi
Subject:	[gnuastro-commits] master 213a1976: Library (type.h): signed ints, same width preferred in reading strings
Date:	Wed, 16 Feb 2022 19:54:52 -0500 (EST)
branch: master
commit 213a1976ccc27cda0e8bc648ae525800e7c1a937
Author: Mohammad Akhlaghi <mohammad@akhlaghi.org>
Commit: Mohammad Akhlaghi <mohammad@akhlaghi.org>

    Library (type.h): signed ints, same width preferred in reading strings
    
    Until now, when reading a number from a string if it was positive, we would
    only assign it to an un-signed integer type: the number of bits in the
    integer was determined based on its largest possible value. However, in C's
    internal automatic type conversion (when dealing with binary operators like
    '+' or '>'), when an un-signed and signed integer of same width is given to
    a binary operator, they are both converted to the unsigned type.
    
    This would produce unexpected results! Because a signed integer can have a
    value like '-1'. But when this value is converted to an unsigned integer of
    the same width, it becomes the maximum value! This issue can be
    demonstrated with the commands below:
    
      astarithmetic 200 200 2 makenew 5 mknoise-sigma int32 \
                    --output=int-noise.fits
      astarithmetic int-noise.fits 100000 gt --output=thresh.fits
    
    The standard deviation of 'int-noise.fits' is 100, so its maximum will be
    around 500 (5-sigma). Therefore, none of the pixels should be larger than
    100000 and 'thresh.fits' should only have a value of 0. However, until this
    commit 'thresh.fits' would have many '1'-valued pixels.
    
    With this commit, this problem has been addressed in three levels:
    
     - In 'gal_type_string_to_number' (where we read a number from the
       command-line string that the user has given), positive values aren't
       only given to unsigned types. If a positive value is smaller than the
       largest possible signed value, it will be given a signed type.
    
     - In 'arithmetic_binary', a new sanity check has been added: when the
       inputs have the same width and are integers, but with different signs, a
       warning will be printed to fully elaborate the problem to the user and
       propose solutions.
    
    This bug was reported by Zohreh Ghaffari and Pedram Ashofteh Ardakani.
    
    This fixes bug #62069.
---
 NEWS             |  4 +++
 lib/arithmetic.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/type.c       | 48 ++++++++++++++++++++++++++++--------
 3 files changed, 117 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index 5835f885..491a1514 100644
--- a/NEWS
+++ b/NEWS
@@ -199,6 +199,10 @@ See the end of the file for license conditions.
               Infante-Sainz.
   bug #62054: Crash in Table's '--catrowfile' when string column is present;
               reported by Manuel Sánchez-Benavente.
+  bug #62069: Wrong Arithmetic result in binary operators when both input
+              operands are integers of same width, but different sign (for
+              example 'int32' and 'uint32'); reported by Zohreh Ghaffari
+              and Pedram Ashofteh Ardakani.
 
 
 
diff --git a/lib/arithmetic.c b/lib/arithmetic.c
index a810bc0f..f3806ae9 100644
--- a/lib/arithmetic.c
+++ b/lib/arithmetic.c
@@ -1694,6 +1694,76 @@ arithmetic_binary_out_type(int operator, gal_data_t *l, 
gal_data_t *r)
 
 
 
+/* Binary arithmetic's type checks: According to C's automatic type
+   conversion in binary operators, the unsigned types have higher
+   precedence for the same width. As a result, something like the following
+   will prove correct (the value after 'check:' will be '1').
+
+        #include <stdio.h>
+        #include <stdlib.h>
+        #include <stdint.h>
+
+        int
+        main(void)
+        {
+          int32_t a=-50;
+          uint32_t b=10000;
+
+          uint8_t o=a>b;
+          printf("check: %u\n", o);
+          return 0;
+        }
+
+   To avoid this situation, it is therefore necessary to print a message
+   and let the user know that strange situations like above may occur. Just
+   note that this won't happen if 'a' and 'b' have different widths: such
+   that this will work fine: 'int8_t a=-1; uint16_t b=50000'. */
+static void
+arithmetic_binary_int_sanity_check(gal_data_t *l, gal_data_t *r,
+                                   int operator)
+{
+  /* Variables to simplify the checks. */
+  int l_is_signed=0, r_is_signed=0;
+
+  /* Checks are only necessary for same-width types. */
+  if( gal_type_sizeof(l->type)==gal_type_sizeof(r->type) )
+    {
+      /* No checks needed if atleast one of the inputs is a float. */
+      if(    l->type==GAL_TYPE_FLOAT32 || l->type==GAL_TYPE_FLOAT64
+          || r->type==GAL_TYPE_FLOAT32 || r->type==GAL_TYPE_FLOAT64 )
+        return;
+      else
+        {
+          if(    l->type==GAL_TYPE_INT8  || l->type==GAL_TYPE_INT16
+              || l->type==GAL_TYPE_INT32 || l->type==GAL_TYPE_INT64 )
+            l_is_signed=1;
+          if(    r->type==GAL_TYPE_INT8  || r->type==GAL_TYPE_INT16
+              || r->type==GAL_TYPE_INT32 || r->type==GAL_TYPE_INT64 )
+            r_is_signed=1;
+          if( l_is_signed!=r_is_signed )
+            error(EXIT_SUCCESS, 0, "the two integer operands given "
+                  "to '%s' have the same width, but a different sign: "
+                  "the first popped operand has type '%s' and the "
+                  "second has type '%s'. This may create unexpected "
+                  "results if the signed input contains negative "
+                  "values. To address this problem there are two "
+                  "options: 1) if you know that the signed input can "
+                  "only have positive values, use Arithmetic's type "
+                  "conversion operators to convert it to an un-signed "
+                  "type of the same width (e.g., 'uint8', 'uint16', "
+                  "'uint32' or 'uint64'). 2) Convert the unsigned input "
+                  "to a signed one of the next largest width with the "
+                  "type conversion operators (e.g., 'int16', 'int32' "
+                  "or 'int64')", gal_arithmetic_operator_string(operator),
+                  gal_type_name(r->type, 1), gal_type_name(l->type, 1));
+        }
+    }
+}
+
+
+
+
+
 static gal_data_t *
 arithmetic_binary(int operator, int flags, gal_data_t *l, gal_data_t *r)
 {
@@ -1713,6 +1783,11 @@ arithmetic_binary(int operator, int flags, gal_data_t 
*l, gal_data_t *r)
           "have the same dimension/size", __func__,
           gal_arithmetic_operator_string(operator));
 
+  /* Print a warning if the inputs are both integers, but have different
+     signs (the user needs to know that the output may not be what they
+     expect!).*/
+  arithmetic_binary_int_sanity_check(l, r, operator);
+
 
   /* Set the output type. For the comparison operators, the output type is
      either 0 or 1, so we will set the output type to 'unsigned char' for
diff --git a/lib/type.c b/lib/type.c
index 89e8e501..50d5a6b0 100644
--- a/lib/type.c
+++ b/lib/type.c
@@ -560,6 +560,7 @@ gal_type_from_string(void **out, char *string, uint8_t type)
 void *
 gal_type_string_to_number(char *string, uint8_t *type)
 {
+  long int l;
   void *ptr, *out;
   int fnz=-1, lnz=0;     /* 'F'irst (or 'L'ast) 'N'on-'Z'ero. */
   uint8_t forcedfloat=0;
@@ -584,15 +585,28 @@ gal_type_string_to_number(char *string, uint8_t *type)
   /* See if the number is actually an integer: */
   if( forcedfloat==0 && ceil(d) == d )
     {
+      /* We know the number is an integer, so we should re-read it again,
+         but this time, as an integer, because: 1) floating point numbers
+         can only preserve a certain number of decimals precisely after a
+         certain number of decimals, they loose precision. 2) Integer
+         comparisons (that are done below) are faster, but this is
+         secondary because the parsing itself takes more time! */
+      l=strtol(string, &tailptr, 0);
+      if(*tailptr!='\0')
+        error(EXIT_FAILURE, 0, "%s: a bug! Please contact us at %s "
+              "to fix this problem. The string '%s' couldn't be "
+              "parsed with 'strtol', but was parsed by 'strtod'",
+              __func__, PACKAGE_BUGREPORT, string);
+
       /* If the number is negative, put it in the signed types (based on
          its value). If its zero or positive, then put it in the unsigned
          types. */
-      if( d < 0 )
+      if( l < 0 )
         {
-          if     (d>INT8_MIN)    { i8=d;  ptr=&i8;  *type=GAL_TYPE_INT8;   }
-          else if(d>INT16_MIN)   { i16=d; ptr=&i16; *type=GAL_TYPE_INT16;  }
-          else if(d>INT32_MIN)   { i32=d; ptr=&i32; *type=GAL_TYPE_INT32;  }
-          else                   { i64=d; ptr=&i64; *type=GAL_TYPE_INT64;  }
+          if     (l>INT8_MIN)    { i8=l;  ptr=&i8;  *type=GAL_TYPE_INT8;  }
+          else if(l>INT16_MIN)   { i16=l; ptr=&i16; *type=GAL_TYPE_INT16; }
+          else if(l>INT32_MIN)   { i32=l; ptr=&i32; *type=GAL_TYPE_INT32; }
+          else                   { i64=l; ptr=&i64; *type=GAL_TYPE_INT64; }
         }
       else
         {
@@ -602,11 +616,25 @@ gal_type_string_to_number(char *string, uint8_t *type)
              confusing situations (for example when the user gives 255), if
              the value is equal to the given maximum of the given type,
              we'll assign it to a larger type. In other words, we won't be
-             using the '<=MAX', but '<MAX'. */
-          if     (d<UINT8_MAX)  { u8=d;  ptr=&u8;  *type=GAL_TYPE_UINT8;  }
-          else if(d<UINT16_MAX) { u16=d; ptr=&u16; *type=GAL_TYPE_UINT16; }
-          else if(d<UINT32_MAX) { u32=d; ptr=&u32; *type=GAL_TYPE_UINT32; }
-          else                  { u64=d; ptr=&u64; *type=GAL_TYPE_UINT64; }
+             using the '<=MAX', but '<MAX'.
+
+             Even though they are positive, we should give priority to the
+             signed types if the number fits in the range of signed type
+             for that width: this is the way that C's internal automatic
+             type conversion works (which is used by Arithmetic's binary
+             operators for example). */
+          if     (l<UINT8_MAX)
+            {  if(l>INT8_MAX)  { u8=l;  ptr=&u8;  *type=GAL_TYPE_UINT8; }
+               else            { i8=l;  ptr=&i8;  *type=GAL_TYPE_INT8;  } }
+          else if(l<UINT16_MAX)
+            {  if(l>INT16_MAX) { u16=l; ptr=&u16; *type=GAL_TYPE_UINT16; }
+               else            { i16=l; ptr=&i16; *type=GAL_TYPE_INT16;  } }
+          else if(l<UINT32_MAX)
+            {  if(l>INT32_MAX) { u32=l; ptr=&u32; *type=GAL_TYPE_UINT32; }
+               else            { i32=l; ptr=&i32; *type=GAL_TYPE_INT32;  } }
+          else
+            {  if(l>INT64_MAX) { u64=l; ptr=&u64; *type=GAL_TYPE_UINT64; }
+               else            { i64=l; ptr=&i64; *type=GAL_TYPE_INT64;  } }
         }
     }
   else
[Prev in Thread]
Current Thread
[Next in Thread]
[gnuastro-commits] master 213a1976: Library (type.h): signed ints, same width preferred in reading strings, Mohammad Akhlaghi <=
Prev by Date: [gnuastro-commits] master 95b8de26: NEWS: minor edit and typo corrections
Next by Date: [gnuastro-commits] master 1ae67c38: Table: properly freeing non-used string row contents with --tail
Previous by thread: [gnuastro-commits] master 95b8de26: NEWS: minor edit and typo corrections
Next by thread: [gnuastro-commits] master 1ae67c38: Table: properly freeing non-used string row contents with --tail
Index(es):
- Date
- Thread