[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Grep-devel] handling of non-BMP characters
From: |
Corinna Vinschen |
Subject: |
Re: [Grep-devel] handling of non-BMP characters |
Date: |
Sun, 16 Dec 2018 21:00:26 +0100 |
User-agent: |
Mutt/1.9.2 (2017-12-15) |
On Dec 16 11:42, Jim Meyering wrote:
> On Sun, Dec 16, 2018 at 11:31 AM Bruno Haible <address@hidden> wrote:
> >
> > Hi Jim,
> >
> > > > Assaf Gordon wrote:
> > > > > "surrogate-pair" test fails on:
> > > > > AIX 7.2
> > > >
> > > > It also fails on Cygwin (that is, on the platform for which this test
> > > > was
> > > > initially introduced by Corinna Vinschen, in 2013).
> > >
> > > Thanks.
> > > With that, I conclude it is time to disable this test, and have just
> > > done so with the following:
> > > https://git.savannah.gnu.org/cgit/grep.git/commit/?id=bdb98cec2e7bf255e1d00eaf8be16299f7bf571e
> >
> > To me, that means to move a serious regression under the rug.
> >
> > Recall what the test does: It creates a file 'in', whose contents is a
> > single
> > (non-BMP) character, followed by a newline. Then it runs
> > grep --file=in in
> > On glibc systems and more generally on systems where wchar_t is a 32-bit
> > type,
> > this invocation prints the character and exits with code 0.
> > On Cygwin systems (and, in some conditions, also AIX systems), this
> > invocation prints nothing and exits with code 1.
> >
> > To me, that is serious, because from the user point of view, characters
> > should
> > not be handled differently depending on whether they are in the BMP or not.
> > (Recall that this is happening in a UTF-8 locale.)
> >
> > It's a regression, because as I understand it from the commit logs, the test
> > must have succeeded on Cygwin right after Corinna Vinschen committed it.
>
> I suppose it's a regression on Cygwin, but given the code went missing
> and no one even noticed the test failure for so long, I have to
> question its importance.
>
> As implied in the commit log where I have just deleted the test, I
> would welcome any attempt to revive it, especially if the result
> includes a test that will be easy to run on a non-cygwin system.
Given I'm not the grep maintainer for Cygwin I was completely unaware of
any recent problems with surrogate pairs in grep. So bear with me if
I'm misunderstanding what just happened.
As far as I understand the commit message of
https://git.savannah.gnu.org/cgit/grep.git/commit/?id=bdb98cec2e7 grep
commit v2.21-62-g936c904 introduced a regression, namely a change in
grep disconnecting the surrogate pair functionality from the functional
part of the grep source. A followup change v2.24-12-g704de87 even made
it worse by removing the function entirely, rather than re-introducing
the functionality missing since v2.21-62-g936c904.
So the commit here now even removes the testcase rather than to repair
the damage in grep by reverting the commits removing the surrogate pair
handling?
If I understand Bruno correctly I'm not the only one seeing a problem
with this idea. The fact that beyond-BMP characters are not used as
often as BMP characters doesn't mean we can just neglect them.
I also don't quite get the point where Cygwin is getting the blame for a
screwed-up commit sequence in grep.
Eric (Cygwin grep maintainer), any input on this?
Thanks,
Corinna
- Re: [Grep-devel] grep testing on AIX, (continued)
- Re: [Grep-devel] grep testing on AIX, Bruno Haible, 2018/12/16
- Re: [Grep-devel] [platform-testers] new snapshot available: grep-3.1.46-504af, Paul Eggert, 2018/12/16
- Re: [Grep-devel] handling of non-BMP characters, Bruno Haible, 2018/12/16
- Re: [Grep-devel] handling of non-BMP characters, Jim Meyering, 2018/12/16
- Re: [Grep-devel] handling of non-BMP characters, Bruno Haible, 2018/12/16
- Re: [Grep-devel] handling of non-BMP characters, Jim Meyering, 2018/12/16
- Re: [Grep-devel] handling of non-BMP characters, Bruno Haible, 2018/12/16
- Re: [Grep-devel] handling of non-BMP characters, Jim Meyering, 2018/12/16
- Re: [Grep-devel] handling of non-BMP characters, Jim Meyering, 2018/12/16
- Re: [Grep-devel] handling of non-BMP characters,
Corinna Vinschen <=
- Re: [Grep-devel] handling of non-BMP characters, Jim Meyering, 2018/12/16
- Re: [Grep-devel] handling of non-BMP characters, Corinna Vinschen, 2018/12/16
- Re: [Grep-devel] handling of non-BMP characters, Corinna Vinschen, 2018/12/16
- Re: [Grep-devel] handling of non-BMP characters, Bruno Haible, 2018/12/19
- Re: [Grep-devel] handling of non-BMP characters, Corinna Vinschen, 2018/12/19
- Re: [Grep-devel] handling of non-BMP characters, Corinna Vinschen, 2018/12/19
- Re: [Grep-devel] handling of non-BMP characters, Jim Meyering, 2018/12/19
- Re: [Grep-devel] handling of non-BMP characters, Paul Eggert, 2018/12/19
- Re: [Grep-devel] handling of non-BMP characters, arnold, 2018/12/20
- Re: [Grep-devel] handling of non-BMP characters, Bruno Haible, 2018/12/20