bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dfa - gawk matching problem on windows and suggested fix


From: Jim Meyering
Subject: Re: dfa - gawk matching problem on windows and suggested fix
Date: Mon, 03 Oct 2011 10:17:44 +0200

Eli Zaretskii wrote:

>> From: Jim Meyering <address@hidden>
>> Cc: address@hidden,  address@hidden
>> Date: Sun, 02 Oct 2011 21:45:01 +0200
>>
>> Eli, can you confirm that this also solves the problem
>
> No, it doesn't.  The branch of the code that calls wctob was where the
> trouble was happening to begin with.  The patch below, which still
> goes through a temporary `unsigned char' variable, does work.

Can you explain or demonstrate how wctob's "int" return
value was inappropriately sign-extended?

>> and that the log text is alright with you?
>
> The log text is fine.  However, I have a couple of minor comments on
> the patch.
>
>> +/* Convert a possibly-signed character to an unsigned character.  This is
>> +   a bit safer than casting to unsigned char, since it catches some type
>> +   errors that the cast doesn't.  */
>> +static inline unsigned char to_uchar (char ch) { return ch; }
>
> Suggest to explicitly mention sign extension in this comment.
>
> Also, is "inline" sufficiently portable for the range of systems
> supported by dfa.c?

Yes.  This use of "inline" has been portable for years.
Besides, autoconf's AC_C_INLINE has worked forever, and
if some application that does not use autoconf also
targets a compiler without inline support, they
can easily define 'inline' to empty.

> Here's the patch that works for me:
>
> --- dfa.c~0   2011-06-23 12:27:01.000000000 +0300
> +++ dfa.c     2011-10-03 08:37:13.807662600 +0200
> @@ -109,6 +109,11 @@ is_blank (int c)
>  /* Sets of unsigned characters are stored as bit vectors in arrays of ints. 
> */
>  typedef int charclass[CHARCLASS_INTS];
>
> +/* Convert a possibly-signed character to an unsigned character.  This is
> +   a bit safer than casting to unsigned char, since it catches some type
> +   errors that the cast doesn't.  */
> +static inline unsigned char to_uchar (char ch) { return ch; }
> +
>  /* Sometimes characters can only be matched depending on the surrounding
>     context.  Such context decisions depend on what the previous character
>     was, and the value of the current (lookahead) character.  Context
> @@ -696,14 +701,16 @@ static unsigned char const *buf_end;    /*
>            {                                  \
>              cur_mb_len = 1;                  \
>              --lexleft;                               \
> -            (wc) = (c) = (unsigned char) *lexptr++; \
> +            (wc) = (c) = to_uchar (*lexptr++);       \
>            }                                  \
>          else                                 \
>            {                                  \
> +            unsigned char uc;                        \
>              lexptr += cur_mb_len;            \
>              lexleft -= cur_mb_len;           \
>              (wc) = _wc;                              \
> -            (c) = wctob(wc);                 \
> +            uc = (unsigned) wctob(wc);               \
> +         (c) = uc;                           \

If that works for you, then you must not be
testing with anything that would set C to \xff.

Using that code would truncate wctob's "int" result to "char" width,
and thus make it impossible to distinguish between a result of 0xff and EOF.

I am unable to reproduce a failure by compiling with -fsigned-char,
so I guess I'll have to rely on your Windows-specific testing.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]