[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: dfa - gawk matching problem on windows and suggested fix
From: |
Jim Meyering |
Subject: |
Re: dfa - gawk matching problem on windows and suggested fix |
Date: |
Mon, 03 Oct 2011 10:17:44 +0200 |
Eli Zaretskii wrote:
>> From: Jim Meyering <address@hidden>
>> Cc: address@hidden, address@hidden
>> Date: Sun, 02 Oct 2011 21:45:01 +0200
>>
>> Eli, can you confirm that this also solves the problem
>
> No, it doesn't. The branch of the code that calls wctob was where the
> trouble was happening to begin with. The patch below, which still
> goes through a temporary `unsigned char' variable, does work.
Can you explain or demonstrate how wctob's "int" return
value was inappropriately sign-extended?
>> and that the log text is alright with you?
>
> The log text is fine. However, I have a couple of minor comments on
> the patch.
>
>> +/* Convert a possibly-signed character to an unsigned character. This is
>> + a bit safer than casting to unsigned char, since it catches some type
>> + errors that the cast doesn't. */
>> +static inline unsigned char to_uchar (char ch) { return ch; }
>
> Suggest to explicitly mention sign extension in this comment.
>
> Also, is "inline" sufficiently portable for the range of systems
> supported by dfa.c?
Yes. This use of "inline" has been portable for years.
Besides, autoconf's AC_C_INLINE has worked forever, and
if some application that does not use autoconf also
targets a compiler without inline support, they
can easily define 'inline' to empty.
> Here's the patch that works for me:
>
> --- dfa.c~0 2011-06-23 12:27:01.000000000 +0300
> +++ dfa.c 2011-10-03 08:37:13.807662600 +0200
> @@ -109,6 +109,11 @@ is_blank (int c)
> /* Sets of unsigned characters are stored as bit vectors in arrays of ints.
> */
> typedef int charclass[CHARCLASS_INTS];
>
> +/* Convert a possibly-signed character to an unsigned character. This is
> + a bit safer than casting to unsigned char, since it catches some type
> + errors that the cast doesn't. */
> +static inline unsigned char to_uchar (char ch) { return ch; }
> +
> /* Sometimes characters can only be matched depending on the surrounding
> context. Such context decisions depend on what the previous character
> was, and the value of the current (lookahead) character. Context
> @@ -696,14 +701,16 @@ static unsigned char const *buf_end; /*
> { \
> cur_mb_len = 1; \
> --lexleft; \
> - (wc) = (c) = (unsigned char) *lexptr++; \
> + (wc) = (c) = to_uchar (*lexptr++); \
> } \
> else \
> { \
> + unsigned char uc; \
> lexptr += cur_mb_len; \
> lexleft -= cur_mb_len; \
> (wc) = _wc; \
> - (c) = wctob(wc); \
> + uc = (unsigned) wctob(wc); \
> + (c) = uc; \
If that works for you, then you must not be
testing with anything that would set C to \xff.
Using that code would truncate wctob's "int" result to "char" width,
and thus make it impossible to distinguish between a result of 0xff and EOF.
I am unable to reproduce a failure by compiling with -fsigned-char,
so I guess I'll have to rely on your Windows-specific testing.
- dfa - gawk matching problem on windows and suggested fix, Aharon Robbins, 2011/10/02
- Re: dfa - gawk matching problem on windows and suggested fix, Jim Meyering, 2011/10/02
- Re: dfa - gawk matching problem on windows and suggested fix, Eli Zaretskii, 2011/10/04
- Re: dfa - gawk matching problem on windows and suggested fix,
Jim Meyering <=
- Re: dfa - gawk matching problem on windows and suggested fix, Eli Zaretskii, 2011/10/04
- Re: dfa - gawk matching problem on windows and suggested fix, Jim Meyering, 2011/10/03
- Re: dfa - gawk matching problem on windows and suggested fix, Eli Zaretskii, 2011/10/04
- Re: dfa - gawk matching problem on windows and suggested fix, Jim Meyering, 2011/10/03
- Re: dfa - gawk matching problem on windows and suggested fix, Eli Zaretskii, 2011/10/04
- Re: dfa - gawk matching problem on windows and suggested fix, Jim Meyering, 2011/10/04
- Re: dfa - gawk matching problem on windows and suggested fix, Aharon Robbins, 2011/10/04