Re: [PATCH] Support arbitrary-precision arithmetic in expr.

bug-coreutils
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Support arbitrary-precision arithmetic in expr.

From:	Jim Meyering
Subject:	Re: [PATCH] Support arbitrary-precision arithmetic in expr.
Date:	Fri, 08 Aug 2008 22:23:13 +0200
James Youngman <address@hidden> wrote:
> This code has more instances of #if HAVE_GMP than I would like, but I
> thought it was important to be able to transition smoothly from
> native arithmetic to arbitrary-precision arithmetic.
>
>
> 2008-08-02  James Youngman  <address@hidden>
>
>       Support arbitrary-precision arithmetic in expr.

Nice work.
Thanks for doing all that.

Here's your patch with a slightly adjusted log, and maybe one
or two insignificant changes (sorry, don't remember), followed
by three small patches:

    * tests/misc/expr: Add tests of the new GMP-based code.

    expr: avoid compiler warnings
    * src/expr.c (die): New "noreturn" function to wrap one-arg use of
    error
    (string_too_long): Use die rather than error.
    (toint): Remove definition of now-unused function.
    (eval6): Remove a little duplication.
    Use die rather than error.
    (dodivide): Remove declaration of now-unused variable.

    * coreutils.texi (factor invocation, expr invocation): Adjust wording.

I haven't had time to review the C code as closely
as I'd like, but since now we get at least some test
coverage via "make check", I'll go ahead and push it soon.
Feedback welcome.

>From f65cafd67b33009d23f940968bbe7f9a08d6fe13 Mon Sep 17 00:00:00 2001
From: James Youngman <address@hidden>
Date: Sat, 2 Aug 2008 21:49:46 +0100
Subject: [PATCH] expr: support arbitrary-precision arithmetic

* src/Makefile.am (expr_LDADD): Link expr against GNU MP.
* doc/coreutils.texi (expr invocation): Describe --bignum,
--no-bignum.   Explain the new arbitrary-precision functionality.
* NEWS: Indicate that arbitrary-precision arithmetic is now
supported in expr.
* src/expr.c (enum valtype): Added mp_integer, signifying a GNU MP
number.
(usage): Document the new options --bignum and --no-bignum which
force and prohibit the use of arbitrary-precision arithmetic,
respectively.
(long_options): data structure for getopt_long, which we need to
use to parse the options mentioned above.
(main): parse these options with getopt_long instead of
parse_long_options.
(valinfo): Downgrade the numeric member of the union from
intmax_t to signed long, since MP lacks functions for promoting an
intmax_t to an arbitrary-precision quantity.
(enum arithmetic_mode): Represents the current choice between
--bignum, --no-bignum and the default (automatically switch from
one to the other if needed).
(integer_overflow): issue a more explicit error message indicating
that MP is not available.
(string_too_long): new function, emits a fatal error message for
the case where an argument to the 'index' expression is too long
for a string offset to be represented.
(int_value): With --bignum, create the value as mp_integer rather
than plain integer.
(substr_value): factored out of eval6; implements "substr".
(freev): also destroy mp_integer values.  Check that no mp_integer
values exist if --no-bignum was specified.
(printv, null, tostring): support mp_integer.
(toint): new funtion for converting from string or mp_integer to
integer.
(getsize): extracts a size_t value from a VALUE object; used to
implement substr.
(promote): promotes a value from integer to mp_integer.
(domult, dodivide): functions for multiplication and division,
factored out of eval4.
(doadd): addition/subraction function, factpred out of eval3.
(eval3): support mp_integer types; call doadd.
(eval4): support mp_integer types; call domult, dodivide.
(eval6): support mp_integer offsets and lengths for "substr" and
"index".
* TODO: Mention that expr supports arbitrary-precision arithmetic,
and suggest that this might also be a good idea for seq.
* AUTHORS (expr): Add James Youngman.
---
 AUTHORS            |    2 +-
 NEWS               |    5 +-
 TODO               |    5 +
 doc/coreutils.texi |   17 ++-
 src/Makefile.am    |    3 +
 src/expr.c         |  655 +++++++++++++++++++++++++++++++++++++++++++++-------
 6 files changed, 594 insertions(+), 93 deletions(-)

diff --git a/AUTHORS b/AUTHORS
index eab9bec..9c3d990 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -25,7 +25,7 @@ du: Torbjörn Granlund, David MacKenzie, Paul Eggert, Jim 
Meyering
 echo: Brian Fox, Chet Ramey
 env: Richard Mlynarik, David MacKenzie
 expand: David MacKenzie
-expr: Mike Parker
+expr: Mike Parker, James Youngman
 factor: Paul Rubin
 false: Jim Meyering
 fmt: Ross Paterson
diff --git a/NEWS b/NEWS
index 8de6355..0c8cb8f 100644
--- a/NEWS
+++ b/NEWS
@@ -21,8 +21,9 @@ GNU coreutils NEWS                                    -*- 
outline -*-
   With this new option, after a short read, dd repeatedly calls read,
   until it fills the incomplete block, reaches EOF, or encounters an error.

-  factor accepts arbitrarily large numbers and factors them using
-  Pollard's rho algorithm.
+  If the GNU MP library is available at configure time, factor and
+  expr support arbitrarily large numbers.  Pollard's rho algorithm is
+  used to factor large numbers.

   ls now colorizes files with capabilities if libcap is available

diff --git a/TODO b/TODO
index 18371b3..c7b095c 100644
--- a/TODO
+++ b/TODO
@@ -155,6 +155,11 @@ remove all uses of the `register' keyword: Done.  add a 
maint.mk rule
 remove or adjust chown's --changes option, since it
   can't always do what it currently says it does.

+Support arbitrary-precision arithmetic in those tools for which it
+makes sense.  Factor and expr already support this via libgmp.
+The "test" program is covered via its string-based comparison of
+integers.  To be converted: seq.
+
 Adapt tools like wc, tr, fmt, etc. (most of the textutils) to be
   multibyte aware.  The problem is that I want to avoid duplicating
   significant blocks of logic, yet I also want to incur only minimal
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index fd1e79e..f5a2e41 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -11066,8 +11066,21 @@ expr invocation
 parentheses and many operators to avoid the shell evaluating them,
 however.

-The only options are @option{--help} and @option{--version}.  @xref{Common
-options}.  Options must precede operands.
+By default, @command{expr} performs operations using native arithmetic
+types, but if a numeric overflow occurs and @command{expr} was built
+with support for the GNU MP library, @command{expr}, it will switch to
+using arbitrary-precision arithmetic.
+
+Apart from @option{--help} and @option{--version} (@pxref{Common
+options}), the following options are supported:
+
address@hidden @samp
address@hidden --bignum
+Forces the use of the GNU MP library.
address@hidden --no-bignum
+Forces the use of native operations only.  In the event of a numeric
+overflow, @command{expr} will fail, even if GNU MP is available.
address@hidden table

 @cindex exit status of @command{expr}
 Exit status:
diff --git a/src/Makefile.am b/src/Makefile.am
index 7410653..359d18e 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -130,6 +130,9 @@ seq_LDADD = $(LDADD) $(POW_LIB)
 nanosec_libs = $(LDADD) $(POW_LIB) $(LIB_NANOSLEEP)

 # for various GMP functions
+expr_LDADD = $(LDADD) $(LIB_GMP)
+
+# for various GMP functions
 factor_LDADD = $(LDADD) $(LIB_GMP)

 sleep_LDADD = $(nanosec_libs)
diff --git a/src/expr.c b/src/expr.c
index c342291..bf55a40 100644
--- a/src/expr.c
+++ b/src/expr.c
@@ -15,6 +15,7 @@
    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */

 /* Author: Mike Parker.
+   Modified for arbitrary-precision calculation by James Youngman.

    This program evaluates expressions.  Each token (operator, operand,
    parenthesis) of the expression must be a seperate argument.  The
@@ -32,8 +33,11 @@
 #include <sys/types.h>
 #include "system.h"

+#include <assert.h>
 #include <regex.h>
-#include "long-options.h"
+#if HAVE_GMP
+#include <gmp.h>
+#endif
 #include "error.h"
 #include "quotearg.h"
 #include "strnumcmp.h"
@@ -42,7 +46,7 @@
 /* The official name of this program (e.g., no `g' prefix).  */
 #define PROGRAM_NAME "expr"

-#define AUTHORS proper_name ("Mike Parker")
+#define AUTHORS proper_name ("Mike Parker"), proper_name ("James Youngman")

 /* Exit statuses.  */
 enum
@@ -57,10 +61,14 @@ enum
     EXPR_FAILURE
   };

-/* The kinds of value we can have.  */
+/* The kinds of value we can have.
+   In the comments below, a variable is described as "arithmetic" if
+   it is either integer or mp_integer.   Variables are of type mp_integer
+   only if GNU MP is available, but the type designator is always defined. */
 enum valtype
 {
   integer,
+  mp_integer,
   string
 };
 typedef enum valtype TYPE;
@@ -71,7 +79,12 @@ struct valinfo
   TYPE type;                   /* Which kind. */
   union
   {                            /* The value itself. */
-    intmax_t i;
+    /* We could use intmax_t but that would integrate less well with GMP,
+       since GMP has mpz_set_si but no intmax_t equivalent. */
+    signed long int i;
+#if HAVE_GMP
+    mpz_t z;
+#endif
     char *s;
   } u;
 };
@@ -85,6 +98,34 @@ static bool nomoreargs (void);
 static bool null (VALUE *v);
 static void printv (VALUE *v);

+/* Arithmetic is done in one of three modes.
+
+   The --bignum option forces all arithmetic to use bignums other than
+   string indexing (mode==MP_ALWAYS).  The --no-bignum option forces
+   all arithmetic to use native types rather than bignums
+   (mode==MP_NEVER).
+
+   The default mode is MP_AUTO if GMP is available and MP_NEVER if
+   not.  Most functions will process a bignum if one is found, but
+   will not convert a native integer to a string if the mode is
+   MP_NEVER. */
+enum arithmetic_mode
+  {
+    MP_NEVER,                  /* Never use bignums */
+#if HAVE_GMP
+    MP_ALWAYS,                 /* Always use bignums. */
+    MP_AUTO,                   /* Switch if result would otherwise overflow */
+#endif
+  };
+static enum arithmetic_mode mode =
+#if HAVE_GMP
+  MP_AUTO
+#else
+  MP_NEVER
+#endif
+  ;
+
+
 void
 usage (int status)
 {
@@ -99,6 +140,10 @@ Usage: %s EXPRESSION\n\
 "),
              program_name, program_name);
       putchar ('\n');
+      fputs (_("\
+      --bignum     always use arbitrary-precision arithmetic\n\
+      --no-bignum  always use single-precision arithmetic\n"),
+              stdout);
       fputs (HELP_OPTION_DESCRIPTION, stdout);
       fputs (VERSION_OPTION_DESCRIPTION, stdout);
       fputs (_("\
@@ -175,13 +220,37 @@ syntax_error (void)
 static void
 integer_overflow (char op)
 {
-  error (EXPR_FAILURE, ERANGE, "%c", op);
+  error (EXPR_FAILURE, 0,
+        _("arithmetic operation %c produced an out of range value, "
+          "but arbitrary-precision arithmetic is not available"), op);
+}
+
+static void
+string_too_long (void)
+{
+  error (EXPR_FAILURE, ERANGE, _("string too long"));
 }

+enum
+{
+  USE_BIGNUM = CHAR_MAX + 1,
+  NO_USE_BIGNUM
+};
+
+static struct option const long_options[] =
+{
+  {"bignum", no_argument, NULL, USE_BIGNUM},
+  {"no-bignum", no_argument, NULL, NO_USE_BIGNUM},
+  {GETOPT_HELP_OPTION_DECL},
+  {GETOPT_VERSION_OPTION_DECL},
+  {NULL, 0, NULL, 0}
+};
+
 int
 main (int argc, char **argv)
 {
   VALUE *v;
+  int c;

   initialize_main (&argc, &argv);
   set_program_name (argv[0]);
@@ -192,23 +261,48 @@ main (int argc, char **argv)
   initialize_exit_failure (EXPR_FAILURE);
   atexit (close_stdout);

-  parse_long_options (argc, argv, PROGRAM_NAME, PACKAGE_NAME, VERSION,
-                     usage, AUTHORS, (char const *) NULL);
-  /* The above handles --help and --version.
-     Since there is no other invocation of getopt, handle `--' here.  */
-  if (argc > 1 && STREQ (argv[1], "--"))
+  /* The argument -0 should not result in an error message. */
+  opterr = 0;
+
+  while ((c = getopt_long (argc, argv, "+", long_options, NULL)) != -1)
     {
-      --argc;
-      ++argv;
+      /* "expr -0" should interpret the -0 as an integer argument.
+        arguments like --foo should also be interpreted as a string
+        argument to be "evaluated".
+       */
+      if ('?' == c)
+       {
+         --optind;
+         break;
+       }
+      else
+       switch (c)
+         {
+         case USE_BIGNUM:
+#if HAVE_GMP
+           mode = MP_ALWAYS;
+#else
+           error (0, 0, _("arbitrary-precision support is not available"));
+#endif
+           break;
+
+         case NO_USE_BIGNUM:
+           mode = MP_NEVER;
+           break;
+
+           case_GETOPT_HELP_CHAR;
+
+           case_GETOPT_VERSION_CHAR (PROGRAM_NAME, AUTHORS);
+         }
     }

-  if (argc <= 1)
+  if (argc <= optind)
     {
       error (0, 0, _("missing operand"));
       usage (EXPR_INVALID);
     }

-  args = argv + 1;
+  args = argv + optind;

   v = eval (true);
   if (!nomoreargs ())
@@ -221,9 +315,19 @@ main (int argc, char **argv)
 /* Return a VALUE for I.  */

 static VALUE *
-int_value (intmax_t i)
+int_value (long int i)
 {
   VALUE *v = xmalloc (sizeof *v);
+#if HAVE_GMP
+  if (mode == MP_ALWAYS)
+    {
+      /* all integer values are handled as bignums. */
+      mpz_init_set_si (v->u.z, i);
+      v->type = mp_integer;
+      return v;
+    }
+#endif
+
   v->type = integer;
   v->u.i = i;
   return v;
@@ -240,13 +344,42 @@ str_value (char const *s)
   return v;
 }

+
+static VALUE *
+substr_value (char const *s, size_t len, size_t pos, size_t nchars_wanted)
+{
+  if (pos >= len)
+    return str_value ("");
+  else
+    {
+      VALUE *v = xmalloc (sizeof *v);
+      size_t vlen = MIN (nchars_wanted, len - pos + 1);
+      char *vlim;
+      v->type = string;
+      v->u.s = xmalloc (vlen + 1);
+      vlim = mempcpy (v->u.s, s + pos, vlen);
+      *vlim = '\0';
+      return v;
+    }
+}
+
+
 /* Free VALUE V, including structure components.  */

 static void
 freev (VALUE *v)
 {
   if (v->type == string)
-    free (v->u.s);
+    {
+      free (v->u.s);
+    }
+  else if (v->type == mp_integer)
+    {
+      assert (mode != MP_NEVER);
+#if HAVE_GMP
+      mpz_clear (v->u.z);
+#endif
+    }
   free (v);
 }

@@ -255,22 +388,24 @@ freev (VALUE *v)
 static void
 printv (VALUE *v)
 {
-  char *p;
-  char buf[INT_BUFSIZE_BOUND (intmax_t)];
-
   switch (v->type)
     {
     case integer:
-      p = imaxtostr (v->u.i, buf);
+      printf ("%ld\n", v->u.i);
       break;
     case string:
-      p = v->u.s;
+      puts (v->u.s);
       break;
+#if HAVE_GMP
+    case mp_integer:
+      mpz_out_str (stdout, 10, v->u.z);
+      putchar ('\n');
+      break;
+#endif
     default:
       abort ();
     }

-  puts (p);
 }

 /* Return true if V is a null-string or zero-number.  */
@@ -282,6 +417,10 @@ null (VALUE *v)
     {
     case integer:
       return v->u.i == 0;
+#if HAVE_GMP
+    case mp_integer:
+      return mpz_sgn (v->u.z) == 0;
+#endif
     case string:
       {
        char const *cp = v->u.s;
@@ -324,14 +463,29 @@ looks_like_integer (char const *cp)
 static void
 tostring (VALUE *v)
 {
-  char buf[INT_BUFSIZE_BOUND (intmax_t)];
+  char buf[INT_BUFSIZE_BOUND (long int)];

   switch (v->type)
     {
     case integer:
-      v->u.s = xstrdup (imaxtostr (v->u.i, buf));
+      snprintf (buf, sizeof buf, "%ld", v->u.i);
+      v->u.s = xstrdup (buf);
       v->type = string;
       break;
+#if HAVE_GMP
+    case mp_integer:
+      {
+       char *s = mpz_get_str (NULL, 10, v->u.z);
+       if (!s)
+         {
+           xalloc_die ();
+         }
+       mpz_clear (v->u.z);
+       v->u.s = s;
+       v->type = string;
+      }
+      break;
+#endif
     case string:
       break;
     default:
@@ -339,7 +493,8 @@ tostring (VALUE *v)
     }
 }

-/* Coerce V to an integer value.  Return true on success, false on failure.  */
+/* Coerce V to an arithmetic value.
+   Return true on success, false on failure.  */

 static bool
 toarith (VALUE *v)
@@ -347,18 +502,40 @@ toarith (VALUE *v)
   switch (v->type)
     {
     case integer:
+    case mp_integer:
       return true;
+
     case string:
       {
-       intmax_t value;
+       long int value;

        if (! looks_like_integer (v->u.s))
          return false;
-       if (xstrtoimax (v->u.s, NULL, 10, &value, NULL) != LONGINT_OK)
-         error (EXPR_FAILURE, ERANGE, "%s", v->u.s);
-       free (v->u.s);
-       v->u.i = value;
-       v->type = integer;
+       if (xstrtol (v->u.s, NULL, 10, &value, NULL) != LONGINT_OK)
+         {
+#if HAVE_GMP
+           if (mode != MP_NEVER)
+             {
+               char *s = v->u.s;
+               if (mpz_init_set_str (v->u.z, s, 10))
+                 abort ();  /* Bug in looks_like_integer, perhaps. */
+               v->type = mp_integer;
+               free (s);
+             }
+           else
+             {
+               error (EXPR_FAILURE, ERANGE, "%s", v->u.s);
+             }
+#else
+           error (EXPR_FAILURE, ERANGE, "%s", v->u.s);
+#endif
+         }
+       else
+         {
+           free (v->u.s);
+           v->u.i = value;
+           v->type = integer;
+         }
        return true;
       }
     default:
@@ -366,6 +543,85 @@ toarith (VALUE *v)
     }
 }

+/* Convert V into an integer (that is, a non-arbitrary-precision value)
+   Return true on success, false on failure.  */
+static bool
+toint (VALUE *v)
+{
+  if (!toarith (v))
+    return false;
+#if HAVE_GMP
+  if (v->type == mp_integer)
+    {
+      if (mpz_fits_slong_p (v->u.z))
+       {
+         long value = mpz_get_si (v->u.z);
+         mpz_clear (v->u.z);
+         v->u.i = value;
+         v->type = integer;
+       }
+      else
+       return false;           /* value was too big. */
+    }
+#else
+  if (v->type == mp_integer)
+    abort ();                  /* should not happen. */
+#endif
+  return true;
+}
+
+/* Extract a size_t value from a positive arithmetic value, V.
+   The extracted value is stored in *VAL. */
+static bool
+getsize (const VALUE *v, size_t *val, bool *negative)
+{
+  if (v->type == integer)
+    {
+      if (v->u.i < 0)
+       {
+         *negative = true;
+         return false;
+       }
+      else
+       {
+         *negative = false;
+         *val = v->u.i;
+         return true;
+       }
+    }
+  else if (v->type == mp_integer)
+    {
+#if HAVE_GMP
+      if (mpz_sgn (v->u.z) < 0)
+       {
+         *negative = true;
+         return false;
+       }
+      else if (mpz_fits_ulong_p (v->u.z))
+       {
+         unsigned long ul;
+         ul = mpz_get_ui (v->u.z);
+         *val = ul;
+         return true;
+       }
+      else
+       {
+         *negative = false;
+         return false;
+       }
+#else
+      abort ();
+#endif
+
+    }
+  else
+    {
+      abort ();                        /* should not pass a string. */
+    }
+}
+
+
+
 /* Return true and advance if the next token matches STR exactly.
    STR must not be NULL.  */

@@ -544,13 +800,41 @@ eval6 (bool evaluate)
     }
   else if (nextarg ("index"))
     {
+      size_t pos, len;
+
       l = eval6 (evaluate);
       r = eval6 (evaluate);
       tostring (l);
       tostring (r);
-      v = int_value (strcspn (l->u.s, r->u.s) + 1);
-      if (v->u.i == strlen (l->u.s) + 1)
-       v->u.i = 0;
+      pos = strcspn (l->u.s, r->u.s);
+      len = strlen (l->u.s);
+      if (pos == len)
+       {
+         v = int_value (0);
+       }
+      else
+       {
+         if (pos < LONG_MAX)
+           {
+             v = int_value (pos + 1);
+           }
+         else
+           {
+#if HAVE_GMP
+             if (mode != MP_NEVER
+                 && pos < ULONG_MAX)
+               {
+                 v = xmalloc (sizeof *v);
+                 mpz_init_set_ui (v->u.z, pos+1);
+                 v->type = mp_integer;
+               }
+             else
+               string_too_long ();
+#else
+             string_too_long ();
+#endif
+           }
+       }
       freev (l);
       freev (r);
       return v;
@@ -563,19 +847,32 @@ eval6 (bool evaluate)
       i2 = eval6 (evaluate);
       tostring (l);
       llen = strlen (l->u.s);
-      if (!toarith (i1) || !toarith (i2)
-         || llen < i1->u.i
-         || i1->u.i <= 0 || i2->u.i <= 0)
+
+      if (!toarith (i1) || !toarith (i2))
        v = str_value ("");
       else
        {
-         size_t vlen = MIN (i2->u.i, llen - i1->u.i + 1);
-         char *vlim;
-         v = xmalloc (sizeof *v);
-         v->type = string;
-         v->u.s = xmalloc (vlen + 1);
-         vlim = mempcpy (v->u.s, l->u.s + i1->u.i - 1, vlen);
-         *vlim = '\0';
+         size_t pos, len;
+         bool negative = false;
+
+         if (getsize (i1, &pos, &negative))
+           if (getsize (i2, &len, &negative))
+             if (pos == 0 || len == 0)
+               v = str_value ("");
+             else
+               v = substr_value (l->u.s, llen, pos-1, len);
+           else
+             if (negative)
+               v = str_value ("");
+             else
+               error (EXPR_FAILURE, ERANGE,
+                      _("string offset is too large"));
+         else
+           if (negative)
+             v = str_value ("");
+           else
+             error (EXPR_FAILURE, ERANGE,
+                    _("substring length too large"));
        }
       freev (l);
       freev (i1);
@@ -618,6 +915,171 @@ eval5 (bool evaluate)
     }
 }

+
+#if HAVE_GMP
+static void
+promote (VALUE *x)
+{
+  if (x->type == integer)
+    mpz_init_set_si (x->u.z, x->u.i);
+}
+#endif
+
+/* L = L * R.  Both L and R are arithmetic. */
+static void
+domult (VALUE *l, VALUE *r)
+{
+  if (l->type == integer && r->type == integer)
+    {
+      long int val = 0;
+      val = l->u.i * r->u.i;
+      if (! (l->u.i == 0 || r->u.i == 0
+            || ((val < 0) == ((l->u.i < 0) ^ (r->u.i < 0))
+                && val / l->u.i == r->u.i)))
+       {
+         /* Result would (did) overflow.  Handle with MP if available. */
+         if (mode != MP_NEVER)
+           {
+#if HAVE_GMP
+             mpz_init_set_si (l->u.z, l->u.i);
+             mpz_mul_si (l->u.z, l->u.z, r->u.i); /* L*=R */
+             l->type = mp_integer;
+#endif
+           }
+         else
+           {
+             integer_overflow ('*');
+           }
+       }
+      else
+       {
+         l->u.i = val;
+       }
+    }
+  else
+    {
+      /* At least one operand is already mp_integer, so promote the other. */
+#if HAVE_GMP
+      /* We could use mpz_mul_si here if R is not already mp_integer,
+        but for the moment we'll try to minimise code paths. */
+      if (l->type == integer)
+       mpz_init_set_si (l->u.z, l->u.i);
+      if (r->type == integer)
+       mpz_init_set_si (r->u.z, r->u.i);
+      l->type = r->type = mp_integer;
+      mpz_mul (l->u.z, l->u.z, r->u.z); /* L*=R */
+#else
+      abort ();
+#endif
+    }
+}
+
+/* L = L / R or (if WANT_MODULUS) L = L % R */
+static void
+dodivide (VALUE *l, VALUE *r, bool want_modulus)
+{
+  if (r->type == integer && r->u.i == 0)
+    error (EXPR_INVALID, 0, _("division by zero"));
+#if HAVE_GMP
+  if (r->type == mp_integer && mpz_sgn (r->u.z) == 0)
+    error (EXPR_INVALID, 0, _("division by zero"));
+#endif
+  if (l->type == integer && r->type == integer)
+    {
+      if (l->u.i < - INT_MAX && r->u.i == -1)
+       {
+         /* Some x86-style hosts raise an exception for
+            INT_MIN / -1 and INT_MIN % -1, so handle these
+            problematic cases specially.  */
+         if (want_modulus)
+           {
+             /* X mod -1 is zero for all negative X.
+                Although strictly this is implementation-defined,
+                we don't want to coredump, so we avoid the calculation. */
+             l->u.i = 0;
+             return;
+           }
+         else
+           {
+             if (mode != MP_NEVER)
+               {
+#if HAVE_GMP
+                 /* Handle the case by promoting. */
+                 mpz_init_set_si (l->u.z, l->u.i);
+                 l->type = mp_integer;
+#endif
+               }
+             else
+               {
+                 integer_overflow ('/');
+               }
+           }
+       }
+      else
+       {
+         l->u.i = want_modulus ? l->u.i % r->u.i : l->u.i / r->u.i;
+         return;
+       }
+    }
+  /* If we get to here, at least one operand is mp_integer
+     and R is not 0. */
+#if HAVE_GMP
+  {
+    bool round_up = false;     /* do we round up? */
+    int sign_l, sign_r;
+    promote (l);
+    promote (r);
+    sign_l = mpz_sgn (l->u.z);
+    sign_r = mpz_sgn (r->u.z);
+
+    if (!want_modulus)
+      {
+       if (!sign_l)
+         {
+           mpz_set_si (l->u.z, 0);
+         }
+       else if (sign_l < 0 || sign_r < 0)
+         {
+           /* At least one operand is negative.  For integer arithmetic,
+              it's platform-dependent if the operation rounds up or down.
+              We mirror what the implementation does. */
+           switch ((3*sign_l) / (2*sign_r))
+             {
+             case  2:          /* round toward +inf. */
+             case -1:          /* round toward +inf. */
+               mpz_cdiv_q (l->u.z, l->u.z, r->u.z);
+               break;
+             case -2:          /* round toward -inf. */
+             case  1:          /* round toward -inf */
+               mpz_fdiv_q (l->u.z, l->u.z, r->u.z);
+               break;
+             default:
+               abort ();
+             }
+         }
+       else
+         {
+           /* Both operands positive.  Round toward -inf. */
+           mpz_fdiv_q (l->u.z, l->u.z, r->u.z);
+         }
+      }
+    else
+      {
+       mpz_mod (l->u.z, l->u.z, r->u.z); /* L = L % R */
+
+       /* If either operand is negative, it's platform-dependent if
+          the remainer is positive or negative.  We mirror what the
+          implementation does. */
+       if (sign_l % sign_r < 0)
+         mpz_neg (l->u.z, l->u.z); /* L = (-L) */
+      }
+  }
+#else
+  abort ();
+#endif
+}
+
+
 /* Handle *, /, % operators.  */

 static VALUE *
@@ -626,7 +1088,6 @@ eval4 (bool evaluate)
   VALUE *l;
   VALUE *r;
   enum { multiply, divide, mod } fxn;
-  intmax_t val = 0;

 #ifdef EVAL_TRACE
   trace ("eval4");
@@ -647,37 +1108,71 @@ eval4 (bool evaluate)
        {
          if (!toarith (l) || !toarith (r))
            error (EXPR_INVALID, 0, _("non-numeric argument"));
-         if (fxn == multiply)
+         switch (fxn)
            {
-             val = l->u.i * r->u.i;
-             if (! (l->u.i == 0 || r->u.i == 0
-                    || ((val < 0) == ((l->u.i < 0) ^ (r->u.i < 0))
-                        && val / l->u.i == r->u.i)))
-               integer_overflow ('*');
+           case multiply:
+             domult (l, r);
+             break;
+           case divide:
+           case mod:
+             dodivide (l, r, fxn==mod);
+             break;
            }
-         else
+       }
+      freev (r);
+    }
+}
+
+/* L = L + R, or L = L - R */
+static void
+doadd (VALUE *l, VALUE *r, bool add)
+{
+  long int val = 0;
+
+  if (!toarith (l) || !toarith (r))
+    error (EXPR_INVALID, 0, _("non-numeric argument"));
+  if (l->type == integer && r->type == integer)
+    {
+      if (add)
+       {
+         val = l->u.i + r->u.i;
+         if ((val < l->u.i) == (r->u.i < 0))
            {
-             if (r->u.i == 0)
-               error (EXPR_INVALID, 0, _("division by zero"));
-             if (l->u.i < - INTMAX_MAX && r->u.i == -1)
-               {
-                 /* Some x86-style hosts raise an exception for
-                    INT_MIN / -1 and INT_MIN % -1, so handle these
-                    problematic cases specially.  */
-                 if (fxn == divide)
-                   integer_overflow ('/');
-                 val = 0;
-               }
-             else
-               val = fxn == divide ? l->u.i / r->u.i : l->u.i % r->u.i;
+             l->u.i = val;
+             return;
            }
        }
-      freev (l);
-      freev (r);
-      l = int_value (val);
+      else
+       {
+         val = l->u.i - r->u.i;
+         if ((l->u.i < val) == (r->u.i < 0))
+           {
+             l->u.i = val;
+             return;
+           }
+       }
+    }
+  /* If we get to here, either the operation overflowed or at least
+     one operand is an mp_integer. */
+  if (mode != MP_NEVER)
+    {
+#if HAVE_GMP
+      promote (l);
+      promote (r);
+      if (add)
+       mpz_add (l->u.z, l->u.z, r->u.z);
+      else
+       mpz_sub (l->u.z, l->u.z, r->u.z);
+#endif
+    }
+  else
+    {
+      integer_overflow ('-');
     }
 }

+
+
 /* Handle +, - operators.  */

 static VALUE *
@@ -685,8 +1180,7 @@ eval3 (bool evaluate)
 {
   VALUE *l;
   VALUE *r;
-  enum { plus, minus } fxn;
-  intmax_t val = 0;
+  bool add;

 #ifdef EVAL_TRACE
   trace ("eval3");
@@ -695,32 +1189,17 @@ eval3 (bool evaluate)
   while (1)
     {
       if (nextarg ("+"))
-       fxn = plus;
+       add = true;
       else if (nextarg ("-"))
-       fxn = minus;
+       add = false;
       else
        return l;
       r = eval4 (evaluate);
       if (evaluate)
        {
-         if (!toarith (l) || !toarith (r))
-           error (EXPR_INVALID, 0, _("non-numeric argument"));
-         if (fxn == plus)
-           {
-             val = l->u.i + r->u.i;
-             if ((val < l->u.i) != (r->u.i < 0))
-               integer_overflow ('+');
-           }
-         else
-           {
-             val = l->u.i - r->u.i;
-             if ((l->u.i < val) != (r->u.i < 0))
-               integer_overflow ('-');
-           }
+         doadd (l, r, add);
        }
-      freev (l);
       freev (r);
-      l = int_value (val);
     }
 }

--
1.6.0.rc2.2.g59bf


>From 37fdecdda2b9a99ae3849cce009c81b1dcea4211 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Fri, 8 Aug 2008 10:02:34 +0200
Subject: [PATCH] * tests/misc/expr: Add tests of the new GMP-based code.

---
 tests/misc/expr |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/tests/misc/expr b/tests/misc/expr
index ab330e2..dccbd58 100755
--- a/tests/misc/expr
+++ b/tests/misc/expr
@@ -24,6 +24,11 @@ my $prog = 'expr';
 # Turn off localization of executable's output.
 @ENV{qw(LANGUAGE LANG LC_ALL)} = ('C') x 3;

+my $big =      '98782897298723498732987928734';
+my $big_p1 =   '98782897298723498732987928735';
+my $big_sum = '197565794597446997465975857469';
+my $big_prod = '9758060798730154302876482828124348356960410232492450771490';
+
 my @Tests =
     (
      ['a', '5 + 6', {OUT => '11'}],
@@ -149,8 +154,19 @@ my @Tests =
      ['fail-c', {ERR => "$prog: missing operand\n"
                 . "Try `$prog --help' for more information.\n"},
       {EXIT => 2}],
+
+     ['bn-add', "$big + 1", {OUT => $big_p1}],
+     ['bn-add2', "$big + $big_p1", {OUT => $big_sum}],
+     ['bn-sub', "$big_p1 - 1", {OUT => $big}],
+     ['bn-sub2', "$big_sum - $big", {OUT => $big_p1}],
+     ['bn-mul', "$big_p1 '*' $big", {OUT => $big_prod}],
+     ['bn-div', "$big_prod / $big", {OUT => $big_p1}],
     );

+# If using --bignum fails, remove all /^bn-/ tests
+`expr --bignum 1`
+  or @Tests = grep {$_->[0] !~ /^bn-/} @Tests;
+
 # Append a newline to end of each expected `OUT' string.
 my $t;
 foreach $t (@Tests)
--
1.6.0.rc2.2.g59bf


>From 4cc5cbde7ac80add72f27753ef7791197390f624 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Fri, 8 Aug 2008 10:06:54 +0200
Subject: [PATCH] expr: avoid compiler warnings

* src/expr.c (die): New "noreturn" function to wrap one-arg use of
error
(string_too_long): Use die rather than error.
(toint): Remove definition of now-unused function.
(eval6): Remove a little duplication.
Use die rather than error.
(dodivide): Remove declaration of now-unused variable.
---
 src/expr.c |   52 ++++++++++++++++------------------------------------
 1 files changed, 16 insertions(+), 36 deletions(-)

diff --git a/src/expr.c b/src/expr.c
index bf55a40..524ec93 100644
--- a/src/expr.c
+++ b/src/expr.c
@@ -225,10 +225,20 @@ integer_overflow (char op)
           "but arbitrary-precision arithmetic is not available"), op);
 }

+static void die (int exit_status, int errno_val, char const *msg)
+  ATTRIBUTE_NORETURN;
+static void
+die (int exit_status, int errno_val, char const *msg)
+{
+  assert (exit_status != 0);
+  error (exit_status, errno_val, "%s", msg);
+  abort (); /* notreached */
+}
+
 static void
 string_too_long (void)
 {
-  error (EXPR_FAILURE, ERANGE, _("string too long"));
+  die (EXPR_FAILURE, ERANGE, _("string too long"));
 }

 enum
@@ -543,33 +553,6 @@ toarith (VALUE *v)
     }
 }

-/* Convert V into an integer (that is, a non-arbitrary-precision value)
-   Return true on success, false on failure.  */
-static bool
-toint (VALUE *v)
-{
-  if (!toarith (v))
-    return false;
-#if HAVE_GMP
-  if (v->type == mp_integer)
-    {
-      if (mpz_fits_slong_p (v->u.z))
-       {
-         long value = mpz_get_si (v->u.z);
-         mpz_clear (v->u.z);
-         v->u.i = value;
-         v->type = integer;
-       }
-      else
-       return false;           /* value was too big. */
-    }
-#else
-  if (v->type == mp_integer)
-    abort ();                  /* should not happen. */
-#endif
-  return true;
-}
-
 /* Extract a size_t value from a positive arithmetic value, V.
    The extracted value is stored in *VAL. */
 static bool
@@ -829,10 +812,10 @@ eval6 (bool evaluate)
                  v->type = mp_integer;
                }
              else
-               string_too_long ();
-#else
-             string_too_long ();
 #endif
+               {
+                 string_too_long ();
+               }
            }
        }
       freev (l);
@@ -865,14 +848,12 @@ eval6 (bool evaluate)
              if (negative)
                v = str_value ("");
              else
-               error (EXPR_FAILURE, ERANGE,
-                      _("string offset is too large"));
+               die (EXPR_FAILURE, ERANGE, _("string offset is too large"));
          else
            if (negative)
              v = str_value ("");
            else
-             error (EXPR_FAILURE, ERANGE,
-                    _("substring length too large"));
+             die (EXPR_FAILURE, ERANGE, _("substring length too large"));
        }
       freev (l);
       freev (i1);
@@ -1025,7 +1006,6 @@ dodivide (VALUE *l, VALUE *r, bool want_modulus)
      and R is not 0. */
 #if HAVE_GMP
   {
-    bool round_up = false;     /* do we round up? */
     int sign_l, sign_r;
     promote (l);
     promote (r);
--
1.6.0.rc2.2.g59bf


>From 78940644a5219092f0a6572545795e69c7362acf Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Fri, 8 Aug 2008 12:48:31 +0200
Subject: [PATCH] * coreutils.texi (factor invocation, expr invocation): Adjust 
wording.

---
 doc/coreutils.texi |   21 +++++++++++----------
 1 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index f5a2e41..9d9eeb0 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -11068,18 +11068,19 @@ expr invocation

 By default, @command{expr} performs operations using native arithmetic
 types, but if a numeric overflow occurs and @command{expr} was built
-with support for the GNU MP library, @command{expr}, it will switch to
-using arbitrary-precision arithmetic.
+with support for the GNU MP library, @command{expr}, it
+uses arbitrary-precision arithmetic.

 Apart from @option{--help} and @option{--version} (@pxref{Common
 options}), the following options are supported:

 @table @samp
 @item --bignum
-Forces the use of the GNU MP library.
+Perform arithmetic operations using unlimited precision via the GNU MP library.
 @item --no-bignum
-Forces the use of native operations only.  In the event of a numeric
-overflow, @command{expr} will fail, even if GNU MP is available.
+Use only limited-precision native operations.
+In the event of numeric overflow, @command{expr} fails,
+even if GNU MP is available.
 @end table

 @cindex exit status of @command{expr}
@@ -14412,13 +14413,13 @@ factor invocation
 processing.

 @item --bignum
-Forces the use of the GNU MP library.   By default, @command{factor}
-selects between using GNU MP and using native operations on the basis
-of the length of the number to be factored.
+Always use unlimited precision arithmetic via the GNU MP library.
+By default, @command{factor} selects between using GNU MP and using
+native operations on the basis of the length of the number to be factored.

 @item --no-bignum
-Forces the use of native operations instead of GNU MP.  This causes
address@hidden to fail for larger inputs.
+Always use limited-precision native operations, not GNU MP.
+This causes @command{factor} to fail for larger inputs.

 @item --version
 Print the program version on standard output, then exit without further
--
1.6.0.rc2.2.g59bf
[Prev in Thread]
Current Thread
[Next in Thread]
[PATCH] Support arbitrary-precision arithmetic in expr., James Youngman, 2008/08/02
- Re: [PATCH] Support arbitrary-precision arithmetic in expr., Jim Meyering <=
Prev by Date: [PATCH] dd.c: reduce duplication in new O_FULLBLOCK-defining code
Next by Date: whoami problem when two or more users have the same uid
Previous by thread: [PATCH] Support arbitrary-precision arithmetic in expr.
Next by thread: Re: [PATCH 1/2] Support arbitrarily-long numbers with GNU MP.
Index(es):
- Date
- Thread