bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dfa - gawk matching problem on windows and suggested fix


From: Jim Meyering
Subject: Re: dfa - gawk matching problem on windows and suggested fix
Date: Mon, 03 Oct 2011 18:41:25 +0200

Eli Zaretskii wrote:
>> From: Jim Meyering <address@hidden>
>> Cc: address@hidden,  address@hidden
>> Date: Mon, 03 Oct 2011 13:27:12 +0200
>>
>> > I get a negative value for 0x95 from `lex'.  An explicit `fprintf'
>> > after this line:
>> >
>> >             (c) = wctob(wc);
>> >
>> > shows that the value of `c' is -107.  The value returned by wctob, if
>> > printed using %d is -107, and if printed with %x, shows as 0xffffff95.
>>
>> That shows the problem is with the Windows wctob implementation.
>
> What is the problem with it?

It returns a sign-extended result for high-bit-set bytes like 0x95,
and thus presumably returns -1 (EOF) for 0xff.

>> What if you include something like this just above?
>> (this is part of gnulib's wctob replacement, lib/wctob.c)
>
> This version of wctob solves the problem.

Good.  Thanks for confirming that.
Then I suggest that users of dfa.c like gawk arrange to use that.
grep and any users that (by use of gnulib) can be assured of a working
wctob do not need to change dfa.c to work around that bug.

However, while current wctob configure-time tests in gnulib
do detect some wctob problems, I don't see a test for this one.
Hence, if you can confirm that this also causes a problem with grep,
I'll work with you to add a configure-time test in gnulib
so that gnulib-using projects also replace that system's wctob.

For gawk, you can put the #define in e.g., config.h and by compiling
the replacement rpl_wctob function separately when needed (this is the
sort of thing that gnulib would do for you, if gawk were using it).
Changes like these (that work around buggy systems) do not belong in
shared sources like dfa.c.

> But I'd still like to
> understand what is wrong with stock wctob, as the references I
> consulted don't say the result must be positive.  E.g., this:
>
>   http://pubs.opengroup.org/onlinepubs/007904875/functions/wctob.html
>
> says
>
>   The wctob() function shall return EOF if c does not correspond to a
>   character with length one in the initial shift state. Otherwise, it
>   shall return the single-byte representation of that character as an
>   unsigned char converted to int.
>
> Does "unsigned char converted to int" necessarily say that the result
> must be positive?  Or am I missing something?

See above.
Any other interpretation (that sign-extending is ok)
leads to not being able to distinguish 0xff and EOF.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]