[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: grep-2.10 testing
From: |
Jim Meyering |
Subject: |
Re: grep-2.10 testing |
Date: |
Mon, 21 Nov 2011 20:33:22 +0100 |
Jim Meyering wrote:
> Bruno Haible wrote:
>
>> Hi Jim,
>>
>>> diff --git a/src/dfa.c b/src/dfa.c
>>> index e28726d..8f79508 100644
>>> --- a/src/dfa.c
>>> +++ b/src/dfa.c
>>> @@ -1071,8 +1071,18 @@ parse_bracket_exp (void)
>>> return CSET + charclass_index(ccl);
>>> }
>>>
>>> +/* Add this to the test for whether a byte is word-constituent, since on
>>> + BSD-based systems, many values in the 128..255 range are classified as
>>> + alphabetic, while on glibc-based systems, they are not. */
>>> +#ifdef __GLIBC__
>>> +# define octet_valid_as_wide_char(c) 1
>>> +#else
>>> +# define octet_valid_as_wide_char(c) (MBS_SUPPORT && btowc (c) != WEOF)
>>> +#endif
>>> +
>>> /* Return non-zero if C is a `word-constituent' byte; zero otherwise. */
>>> -#define IS_WORD_CONSTITUENT(C) (isalnum(C) || (C) == '_')
>>> +#define IS_WORD_CONSTITUENT(C) \
>>> + (octet_valid_as_wide_char(C) && (isalnum(C) || (C) == '_'))
>>>
>>
>> This code would do the job.
>>
>> Only, I find this macro name 'octet_valid_as_wide_char' confusing -
>> because values such as 0xC3 are valid octets and also valid wide characters.
>> I would call this macro 'is_valid_single_byte_character' or
>> 'is_valid_unibyte_character'. Then it's clear why it has to map 0xC3 to false
>> in UTF-8 encoding.
>
> Thanks. I prefer your names, too.
> I'll use is_valid_unibyte_character.
I pushed this:
>From f3d95e96a371111f8b9b4941f1075933c904142a Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Mon, 21 Nov 2011 18:38:17 +0100
Subject: [PATCH] portability: work consistently on *BSD systems
* src/dfa.c (is_valid_unibyte_character): Define.
(IS_WORD_CONSTITUENT): Use it here, to make grep work consistently
even on *BSD systems, which use different tables for ctype macros
like isalpha. http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4022
With help from Bruno Haible.
---
src/dfa.c | 12 +++++++++++-
1 files changed, 11 insertions(+), 1 deletions(-)
diff --git a/src/dfa.c b/src/dfa.c
index e28726d..9e7c136 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -1071,8 +1071,18 @@ parse_bracket_exp (void)
return CSET + charclass_index(ccl);
}
+/* Add this to the test for whether a byte is word-constituent, since on
+ BSD-based systems, many values in the 128..255 range are classified as
+ alphabetic, while on glibc-based systems, they are not. */
+#ifdef __GLIBC__
+# define is_valid_unibyte_character(c) 1
+#else
+# define is_valid_unibyte_character(c) (MBS_SUPPORT && btowc (c) != WEOF)
+#endif
+
/* Return non-zero if C is a `word-constituent' byte; zero otherwise. */
-#define IS_WORD_CONSTITUENT(C) (isalnum(C) || (C) == '_')
+#define IS_WORD_CONSTITUENT(C) \
+ (is_valid_unibyte_character(C) && (isalnum(C) || (C) == '_'))
static token
lex (void)
--
1.7.8.rc2.3.g0911
- Re: grep-2.9.69-f91c on MSVC 9, (continued)
Re: grep-2.10 testing (was: grep-2.9.69-f91c testing), Bruno Haible, 2011/11/20
- Re: grep-2.10 testing, Jim Meyering, 2011/11/20
- Message not available
- Re: grep-2.10 testing, Jim Meyering, 2011/11/20
- Re: grep-2.10 testing, Bruno Haible, 2011/11/20
- Re: grep-2.10 testing, Jim Meyering, 2011/11/21
- Re: grep-2.10 testing, Bruno Haible, 2011/11/21
- Re: grep-2.10 testing, Jim Meyering, 2011/11/21
- Re: grep-2.10 testing,
Jim Meyering <=