bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dfa - gawk matching problem on windows and suggested fix


From: Eli Zaretskii
Subject: Re: dfa - gawk matching problem on windows and suggested fix
Date: Mon, 03 Oct 2011 02:53:35 -0400

> From: Jim Meyering <address@hidden>
> Cc: address@hidden,  address@hidden
> Date: Sun, 02 Oct 2011 21:45:01 +0200
> 
> Eli, can you confirm that this also solves the problem

No, it doesn't.  The branch of the code that calls wctob was where the
trouble was happening to begin with.  The patch below, which still
goes through a temporary `unsigned char' variable, does work.

> and that the log text is alright with you?

The log text is fine.  However, I have a couple of minor comments on
the patch.

> +/* Convert a possibly-signed character to an unsigned character.  This is
> +   a bit safer than casting to unsigned char, since it catches some type
> +   errors that the cast doesn't.  */
> +static inline unsigned char to_uchar (char ch) { return ch; }

Suggest to explicitly mention sign extension in this comment.

Also, is "inline" sufficiently portable for the range of systems
supported by dfa.c?

Here's the patch that works for me:

--- dfa.c~0     2011-06-23 12:27:01.000000000 +0300
+++ dfa.c       2011-10-03 08:37:13.807662600 +0200
@@ -109,6 +109,11 @@ is_blank (int c)
 /* Sets of unsigned characters are stored as bit vectors in arrays of ints. */
 typedef int charclass[CHARCLASS_INTS];
 
+/* Convert a possibly-signed character to an unsigned character.  This is
+   a bit safer than casting to unsigned char, since it catches some type
+   errors that the cast doesn't.  */
+static inline unsigned char to_uchar (char ch) { return ch; }
+
 /* Sometimes characters can only be matched depending on the surrounding
    context.  Such context decisions depend on what the previous character
    was, and the value of the current (lookahead) character.  Context
@@ -696,14 +701,16 @@ static unsigned char const *buf_end;      /* 
           {                                    \
             cur_mb_len = 1;                    \
             --lexleft;                         \
-            (wc) = (c) = (unsigned char) *lexptr++; \
+            (wc) = (c) = to_uchar (*lexptr++); \
           }                                    \
         else                                   \
           {                                    \
+            unsigned char uc;                  \
             lexptr += cur_mb_len;              \
             lexleft -= cur_mb_len;             \
             (wc) = _wc;                                \
-            (c) = wctob(wc);                   \
+            uc = (unsigned) wctob(wc);         \
+           (c) = uc;                           \
           }                                    \
       }                                                \
   } while(0)
@@ -725,7 +732,7 @@ static unsigned char const *buf_end;        /* 
         else                         \
           return lasttok = END;              \
       }                                      \
-    (c) = (unsigned char) *lexptr++;  \
+    (c) = to_uchar (*lexptr++);              \
     --lexleft;                       \
   } while(0)
 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]