grep-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Grep-devel] handling of non-BMP characters


From: Corinna Vinschen
Subject: Re: [Grep-devel] handling of non-BMP characters
Date: Sun, 16 Dec 2018 21:00:26 +0100
User-agent: Mutt/1.9.2 (2017-12-15)

On Dec 16 11:42, Jim Meyering wrote:
> On Sun, Dec 16, 2018 at 11:31 AM Bruno Haible <address@hidden> wrote:
> >
> > Hi Jim,
> >
> > > > Assaf Gordon wrote:
> > > > > "surrogate-pair" test fails on:
> > > > >     AIX 7.2
> > > >
> > > > It also fails on Cygwin (that is, on the platform for which this test 
> > > > was
> > > > initially introduced by Corinna Vinschen, in 2013).
> > >
> > > Thanks.
> > > With that, I conclude it is time to disable this test, and have just
> > > done so with the following:
> > > https://git.savannah.gnu.org/cgit/grep.git/commit/?id=bdb98cec2e7bf255e1d00eaf8be16299f7bf571e
> >
> > To me, that means to move a serious regression under the rug.
> >
> > Recall what the test does: It creates a file 'in', whose contents is a 
> > single
> > (non-BMP) character, followed by a newline. Then it runs
> >    grep --file=in in
> > On glibc systems and more generally on systems where wchar_t is a 32-bit 
> > type,
> > this invocation prints the character and exits with code 0.
> > On Cygwin systems (and, in some conditions, also AIX systems), this
> > invocation prints nothing and exits with code 1.
> >
> > To me, that is serious, because from the user point of view, characters 
> > should
> > not be handled differently depending on whether they are in the BMP or not.
> > (Recall that this is happening in a UTF-8 locale.)
> >
> > It's a regression, because as I understand it from the commit logs, the test
> > must have succeeded on Cygwin right after Corinna Vinschen committed it.
> 
> I suppose it's a regression on Cygwin, but given the code went missing
> and no one even noticed the test failure for so long, I have to
> question its importance.
> 
> As implied in the commit log where I have just deleted the test, I
> would welcome any attempt to revive it, especially if the result
> includes a test that will be easy to run on a non-cygwin system.

Given I'm not the grep maintainer for Cygwin I was completely unaware of
any recent problems with surrogate pairs in grep.  So bear with me if
I'm misunderstanding what just happened.

As far as I understand the commit message of
https://git.savannah.gnu.org/cgit/grep.git/commit/?id=bdb98cec2e7 grep
commit v2.21-62-g936c904 introduced a regression, namely a change in
grep disconnecting the surrogate pair functionality from the functional
part of the grep source.  A followup change v2.24-12-g704de87 even made
it worse by removing the function entirely, rather than re-introducing
the functionality missing since v2.21-62-g936c904.

So the commit here now even removes the testcase rather than to repair
the damage in grep by reverting the commits removing the surrogate pair
handling?

If I understand Bruno correctly I'm not the only one seeing a problem
with this idea.  The fact that beyond-BMP characters are not used as
often as BMP characters doesn't mean we can just neglect them.

I also don't quite get the point where Cygwin is getting the blame for a
screwed-up commit sequence in grep.

Eric (Cygwin grep maintainer), any input on this?


Thanks,
Corinna



reply via email to

[Prev in Thread] Current Thread [Next in Thread]