bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep-2.10 testing


From: Jim Meyering
Subject: Re: grep-2.10 testing
Date: Mon, 21 Nov 2011 15:07:06 +0100

Bruno Haible wrote:

> Hi Jim,
>
>> diff --git a/src/dfa.c b/src/dfa.c
>> index e28726d..8f79508 100644
>> --- a/src/dfa.c
>> +++ b/src/dfa.c
>> @@ -1071,8 +1071,18 @@ parse_bracket_exp (void)
>>    return CSET + charclass_index(ccl);
>>  }
>>
>> +/* Add this to the test for whether a byte is word-constituent, since on
>> +   BSD-based systems, many values in the 128..255 range are classified as
>> +   alphabetic, while on glibc-based systems, they are not.  */
>> +#ifdef __GLIBC__
>> +# define octet_valid_as_wide_char(c) 1
>> +#else
>> +# define octet_valid_as_wide_char(c) (MBS_SUPPORT && btowc (c) != WEOF)
>> +#endif
>> +
>>  /* Return non-zero if C is a `word-constituent' byte; zero otherwise.  */
>> -#define IS_WORD_CONSTITUENT(C) (isalnum(C) || (C) == '_')
>> +#define IS_WORD_CONSTITUENT(C) \
>> +  (octet_valid_as_wide_char(C) && (isalnum(C) || (C) == '_'))
>>
>
> This code would do the job.
>
> Only, I find this macro name 'octet_valid_as_wide_char' confusing -
> because values such as 0xC3 are valid octets and also valid wide characters.
> I would call this macro 'is_valid_single_byte_character' or
> 'is_valid_unibyte_character'. Then it's clear why it has to map 0xC3 to false
> in UTF-8 encoding.

Thanks.  I prefer your names, too.
I'll use is_valid_unibyte_character.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]