|
From: | Paul Eggert |
Subject: | bug#16919: [PATCH] fix mismatch between dfa and regex for treatment of titlecase |
Date: | Sun, 02 Mar 2014 12:37:09 -0800 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 |
Norihiro Tanaka wrote:
I found difference between dfa and regex (glibc) treatment of titlecase.
Thanks for bringing this up, but I'm afraid that it appears that regex is buggy in this area. The regex code does the match by converting pattern and text to uppercase, and then trying a match with uppercase. But this is incorrect for an example like the following, which uses '\(\)\1' to force using the regex code:
echo 'ς' | grep -i '\(\)\1σ'This should output nothing, because terminal sigma is not the same as lowercase sigma even when case is ignored. But since the uppercase counterpart of both characters is capital sigma, grep incorrectly outputs the terminal sigma. The dfa code gets it right.
POSIX is muddy in this area, unfortunately, but I don't see any interpretation whereby ς and σ should match when case is ignored.
[Prev in Thread] | Current Thread | [Next in Thread] |