[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: dfa - gawk matching problem on windows and suggested fix
From: |
Jim Meyering |
Subject: |
Re: dfa - gawk matching problem on windows and suggested fix |
Date: |
Tue, 04 Oct 2011 08:30:02 +0200 |
Eli Zaretskii wrote:
>> From: Jim Meyering <address@hidden>
>> Cc: address@hidden, address@hidden
>> Date: Mon, 03 Oct 2011 18:41:25 +0200
>>
>> > This version of wctob solves the problem.
>>
>> Good. Thanks for confirming that.
>> Then I suggest that users of dfa.c like gawk arrange to use that.
>> grep and any users that (by use of gnulib) can be assured of a working
>> wctob do not need to change dfa.c to work around that bug.
>>
>> However, while current wctob configure-time tests in gnulib
>> do detect some wctob problems, I don't see a test for this one.
>> Hence, if you can confirm that this also causes a problem with grep,
>> I'll work with you to add a configure-time test in gnulib
>> so that gnulib-using projects also replace that system's wctob.
>
> It will take time for me to look in grep, because I'd need to build my
> own binary from sources.
>
> For Gawk, the configure-time test is not going to solve the problem on
> Windows because the Windows port of Gawk does not use the configure
> script, it is built using a separately maintained Makefile. So for
> Gawk, I can simply put the replacement wctob on a Windows-specific
> file (which exists anyway, for other functions that need wrappers or
> replacements).
FYI, this is what I'm going to push.
The only piece lacking is the [...] note in NEWS where I
normally document in which version the bug was introduced.
Since I have been unable to reproduce it, I haven't bothered
to try to deduce when it was introduced.
>From 7d20c09e3e7cf3af9060f395e884fca285ce3598 Mon Sep 17 00:00:00 2001
From: Eli Zaretskii <address@hidden>
Date: Sun, 2 Oct 2011 21:33:53 +0200
Subject: [PATCH] dfa: don't mishandle high-bit bytes in a regexp with
signed-char
This appears to arise only on systems for which "char" is signed.
* src/dfa.c (FETCH_WC, FETCH): Produce an unsigned value, rather
than a sign-extended one. Fixes a bug on MS-Windows with compiling
patterns that include characters with the 8-th bit set.
(to_uchar): Define. From coreutils.
Reported by David Millis <address@hidden>.
See http://thread.gmane.org/gmane.comp.gnu.grep.bugs/3893
* NEWS (Bug fixes): Mention it.
---
NEWS | 5 +++++
src/dfa.c | 9 +++++++--
2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/NEWS b/NEWS
index 8578e82..2b06af4 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,11 @@ GNU grep NEWS -*- outline
-*-
* Noteworthy changes in release ?.? (????-??-??) [?]
+** Bug fixes
+
+ grep no longer mishandles high-bit-set pattern bytes on systems
+ where "char" is a signed type. [bug appears to affect only MS-Windows]
+
grep now rejects a command like "grep -r pattern . > out",
in which the output file is also one of the inputs,
because it can result in an "infinite" disk-filling loop.
diff --git a/src/dfa.c b/src/dfa.c
index 8611435..dc87915 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -86,6 +86,11 @@
/* Sets of unsigned characters are stored as bit vectors in arrays of ints. */
typedef int charclass[CHARCLASS_INTS];
+/* Convert a possibly-signed character to an unsigned character. This is
+ a bit safer than casting to unsigned char, since it catches some type
+ errors that the cast doesn't. */
+static inline unsigned char to_uchar (char ch) { return ch; }
+
/* Sometimes characters can only be matched depending on the surrounding
context. Such context decisions depend on what the previous character
was, and the value of the current (lookahead) character. Context
@@ -686,7 +691,7 @@ static unsigned char const *buf_end; /* reference to
end in dfaexec(). */
{ \
cur_mb_len = 1; \
--lexleft; \
- (wc) = (c) = (unsigned char) *lexptr++; \
+ (wc) = (c) = to_uchar (*lexptr++); \
} \
else \
{ \
@@ -715,7 +720,7 @@ static unsigned char const *buf_end; /* reference to
end in dfaexec(). */
else \
return lasttok = END; \
} \
- (c) = (unsigned char) *lexptr++; \
+ (c) = to_uchar (*lexptr++); \
--lexleft; \
} while(0)
--
1.7.7.rc0.362.g5a14
- dfa - gawk matching problem on windows and suggested fix, Aharon Robbins, 2011/10/02
- Re: dfa - gawk matching problem on windows and suggested fix, Jim Meyering, 2011/10/02
- Re: dfa - gawk matching problem on windows and suggested fix, Eli Zaretskii, 2011/10/04
- Re: dfa - gawk matching problem on windows and suggested fix, Jim Meyering, 2011/10/03
- Re: dfa - gawk matching problem on windows and suggested fix, Eli Zaretskii, 2011/10/04
- Re: dfa - gawk matching problem on windows and suggested fix, Jim Meyering, 2011/10/03
- Re: dfa - gawk matching problem on windows and suggested fix, Eli Zaretskii, 2011/10/04
- Re: dfa - gawk matching problem on windows and suggested fix, Jim Meyering, 2011/10/03
- Re: dfa - gawk matching problem on windows and suggested fix, Eli Zaretskii, 2011/10/04
- Re: dfa - gawk matching problem on windows and suggested fix,
Jim Meyering <=
- Re: dfa - gawk matching problem on windows and suggested fix, Aharon Robbins, 2011/10/04