bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: large integer truncation in regex module


From: Steven M. Schweda
Subject: Re: large integer truncation in regex module
Date: Sat, 26 May 2012 15:24:56 -0500 (CDT)

   Re: http://lists.gnu.org/archive/html/bug-gnulib/2012-03/msg00154.html

> [...] I think an ifdef may be used instead [...]

   I agree.  I, too, recently ran into this problem, in my case, on VMS.
On Alpha (and IA64, both 64-bit systems), the result is annoying
informational messages:

ALP $ cc /include = [] /noobject regex.c

          dfa->word_char[0] = UINT64_C (0x03ff000000000000);
..............................^
%CC-I-INTCONSTTRUNC, In this statement, conversion of the constant "0X03FF000000
000000" to unsigned long type will cause data loss.
at line number 961 in file ALP$DKC0:[UTILITY.SOURCE.ZIP.regex]regcomp.c;1

          dfa->word_char[1] = UINT64_C (0x07fffffe87fffffe);
..............................^
%CC-I-INTCONSTTRUNC, In this statement, conversion of the constant "0X07FFFFFE87
FFFFFE" to unsigned long type will cause data loss.
at line number 962 in file ALP$DKC0:[UTILITY.SOURCE.ZIP.regex]regcomp.c;1


   On VAX (a 32-bit system), it's fatal.  Even with a lame work-around:
      #define UINT64_C(x) x##UL
to avoid warnings like the following, caused by trying to use "ULL" with
a compiler which doesn't understand "long long":

                  dfa->word_char[0] = UINT64_C (0x03ff000000000000);
        ..............................^
%CC-W-INVALTOKEN, Invalid token discarded.
                At line number 961 in ALP$DKC0:[UTILITY.SOURCE.ZIP.REGEX]REGCOMP
.C;1.

one still gets these (fatal) complaints:

GIMP $ cc /include = [] /noobject regex.c
                  dfa->word_char[0] = UINT64_C (0x03ff000000000000);
        ..............................^
%CC-E-INTCONST, Ill-formed integer constant.
                At line number 961 in ALP$DKC0:[UTILITY.SOURCE.ZIP.REGEX]REGCOMP
.C;1.

                  dfa->word_char[1] = UINT64_C (0x07fffffe87fffffe);
        ..............................^
%CC-E-INTCONST, Ill-formed integer constant.
                At line number 962 in ALP$DKC0:[UTILITY.SOURCE.ZIP.REGEX]REGCOMP
.C;1.

All of which are caused by waiting until run-time to do work which
should have been done by the C preprocessor at compile-time.

> Generally speaking we prefer 'if (xxx)' to '#if xxx' where
> either will do, because the former is easier to read and
> reason about.

   Really?  I must be getting old.  Ever since I started programming
computers, the goal was always a program which worked correctly, and
which, subject to the usual engineering trade-offs, occupied minimal
storage, and ran maximally fast.  When targeted at multiple system
types, portability was highly valued, too.  Any (claimed) difference in
readability and reasonability between "if (xxx)" and "#if xxx" would not
be given priority over all other considerations, especially in a case
which hits the trifecta of bad code: bigger size, lower speed, and worse
portability.

   Which is better, a (small but questionable) advantage in code
readability, or a hideous pile of lame work-arounds needed to
accommodate that "advantage"?

>  If the only problem with 'if (xxx)' is a bogus
> warning by some random compiler then it's probably better to
> leave it alone (and get the compiler fixed....).

   If "a bogus warning by some random compiler" actually means "a
legitimate informational/warning/error diagnostic from any compiler
which I don't use", then this might make some sense.  But probably not.
No doubt, the advice to "get the compiler fixed" was well-meant, but it
would be difficult to apply to a closed-source product like DEC/Compaq C
on VMS VAX, which has probably not seen a fix since around
"16-JAN-2001", and probably never will see another (even to replace
"Compaq" with "HP").

   I realize that "GNU" and "portable" are spelled differently for a
reason, and I realize that I'm old and stupid, but I can see no valid
reason to retain this perverse, inefficient, unportable code, when a fix
would be so easy.

   "This program is distributed in the hope that it will be useful,
[...]."  But not too useful to too many people, apparently.  I'd sound
more appreciative of, and grateful for, this (would-be-)nice package if
some deliberate implementation decisions didn't make it so difficult to
use.

   For the curious, initially, I was considering the use of the GNU
regex package in the Info-ZIP Zip and UnZip programs, but our
portability requirements seem to be a little more stringent than those
for GNU regex, so I've dropped the idea for now.  Then I ran into the
same problems in the latest Wget kit, which now incorporates GNU regex
code, and which I still try to keep available on VMS.  (Imagine my
surprise when a Web search for the fix for this problem found instead a
discussion wherein the obvious fix was rejected on esthetic grounds.)

   For the record:

ALP $ cc /version
HP C V7.3-009 on OpenVMS Alpha V8.3

GIMP $ cc /version
Compaq C V6.4-005 on OpenVMS VAX V7.3

   Thanks for your attention.

------------------------------------------------------------------------

   Steven M. Schweda               address@hidden
   382 South Warwick Street        (+1) 651-699-9818
   Saint Paul  MN  55105-2547



reply via email to

[Prev in Thread] Current Thread [Next in Thread]