[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: grep-2.10 testing
From: |
Jim Meyering |
Subject: |
Re: grep-2.10 testing |
Date: |
Mon, 21 Nov 2011 15:07:06 +0100 |
Bruno Haible wrote:
> Hi Jim,
>
>> diff --git a/src/dfa.c b/src/dfa.c
>> index e28726d..8f79508 100644
>> --- a/src/dfa.c
>> +++ b/src/dfa.c
>> @@ -1071,8 +1071,18 @@ parse_bracket_exp (void)
>> return CSET + charclass_index(ccl);
>> }
>>
>> +/* Add this to the test for whether a byte is word-constituent, since on
>> + BSD-based systems, many values in the 128..255 range are classified as
>> + alphabetic, while on glibc-based systems, they are not. */
>> +#ifdef __GLIBC__
>> +# define octet_valid_as_wide_char(c) 1
>> +#else
>> +# define octet_valid_as_wide_char(c) (MBS_SUPPORT && btowc (c) != WEOF)
>> +#endif
>> +
>> /* Return non-zero if C is a `word-constituent' byte; zero otherwise. */
>> -#define IS_WORD_CONSTITUENT(C) (isalnum(C) || (C) == '_')
>> +#define IS_WORD_CONSTITUENT(C) \
>> + (octet_valid_as_wide_char(C) && (isalnum(C) || (C) == '_'))
>>
>
> This code would do the job.
>
> Only, I find this macro name 'octet_valid_as_wide_char' confusing -
> because values such as 0xC3 are valid octets and also valid wide characters.
> I would call this macro 'is_valid_single_byte_character' or
> 'is_valid_unibyte_character'. Then it's clear why it has to map 0xC3 to false
> in UTF-8 encoding.
Thanks. I prefer your names, too.
I'll use is_valid_unibyte_character.
- Re: grep-2.9.69-f91c on MSVC 9, (continued)
Re: grep-2.10 testing (was: grep-2.9.69-f91c testing), Bruno Haible, 2011/11/20
- Re: grep-2.10 testing, Jim Meyering, 2011/11/20
- Message not available
- Re: grep-2.10 testing, Jim Meyering, 2011/11/20
- Re: grep-2.10 testing, Bruno Haible, 2011/11/20
- Re: grep-2.10 testing, Jim Meyering, 2011/11/21
- Re: grep-2.10 testing, Bruno Haible, 2011/11/21
- Re: grep-2.10 testing,
Jim Meyering <=
- Re: grep-2.10 testing, Jim Meyering, 2011/11/21