bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16919: [PATCH] fix mismatch between dfa and regex for treatment of t


From: Paul Eggert
Subject: bug#16919: [PATCH] fix mismatch between dfa and regex for treatment of titlecase
Date: Wed, 05 Mar 2014 10:50:54 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0

On 03/05/2014 07:11 AM, Norihiro Tanaka wrote:
I still believe that upper or lower case of a character should
also match title case

The (soon-to-be-fixed) gnulib regex code agrees with you, assuming that towupper (X) agrees for all three values of X, because it uses (towupper (input) == towupper (pattern)). However, the most-plausible reading of POSIX does not agree with you, as it would require (input == pattern || towlower (input) == pattern || towupper (input) == pattern), which means a titlecase pattern will match only itself.

It seems pretty clear to me that the most-plausible reading of POSIX is buggy, for this reason. No wonder so many implementations fail to conform to it.

I thought of a different way where gnulib/glibc regex does not conform to POSIX, and here there doesn't seem to be any ambiguity about it. In the POSIX locale when ignoring case, the pattern '[Z-a]' matches the data 'Z', 'z', 'A', 'a', and the nonalphabetic characters like '^' that collate between 'Z' and 'a'. But the glibc regex code rejects that pattern entirely. Conversely, in the same situation the glibc regex code says '[A-z]' matches only alphabetic characters, whereas POSIX says it should also match the nonalphabetic characters like '^' that collate between 'Z' and 'a'. It appears that nobody cares, as this incompatibility has been present for years and I don't recall anyone complaining. Though it is weird that this means "grep PAT" can match some lines that "grep -i PAT" doesn't.

Here POSIX is not merely ambiguous, it's clearly disagreeing with common practice. It's not clear whether the bug is in POSIX or in the implementation.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]