[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Plan for grep [bug-grep]
From: |
Charles Levert |
Subject: |
Re: Plan for grep [bug-grep] |
Date: |
Tue, 8 Mar 2005 09:47:26 -0500 |
User-agent: |
Mutt/1.4.1i |
* On Tuesday 2005-03-08 at 13:13:07 +0000, Tim Waugh wrote:
> On Tue, Mar 08, 2005 at 05:38:36AM -0500, Charles Levert wrote:
>
> > BTW, is the assumption (in the current code)
> > that any two corresponding uppercase and
> > lowercase Unicode code points have the same
> > UTF-8 octet length (or 8-bit code unit lenght)
> > always a safe (secure) one?
>
> Where do you see that assumption? Is that assumption also in the
> Fedora Core patched grep?
In main() in src/grep.c, it seems to me that
the keys variable is being towlower()ed under
that assumption, as there is only a single i
loop index variable.
In check_multibyte_string() in src/search.c,
same thing for the buf variable.
Please confirm or deny.
There may be other places, I didn't perform an
exhaustive search of the code. Note that under
a normal "en_US.UTF-8" locale definition, that
assumption is reasonable, hence the formulation
of my original question.
> > Since performance is an issue, measuring it could
> > be included in testing, as well as reporting
> > serious discrepancies between the results of
> > identical tests being performed under various
> > different locales.
>
> As part of 'make check'? If that's what you mean, better make sure
> not to use wall-clock time to measure against but 'user' as reported
> by time(1)!
Well obviously! That or user+system, although
system is much less relevant here.
Also the time bash builtin and /usr/bin/time
don't seem to share the same output syntax,
so we have to be careful with that.
> > The only danger I see in waiting to do this is
> > that there seems to have been improvements in
> > UTF-8 handling by glibc's regex code. Maybe all
> > the -i kludges are not even needed anymore.
> > Maybe there are also performance issues (either
> > way) with this.
> >
> > That's why I previously stated that I saw doing
> > this as a priority: other items are affected.
>
> The undeniable improvements in the glibc regex code are very useful --
> however, the current (unpatched) grep multibyte handling is flawed in
> many more ways than you might guess, and *that* is the thing to fix
> first when doing performance testing. See
> grep-2.5.1-egf-speedup.patch.
Ok.
[bug-grep] Plan for grep, Stepan Kasal, 2005/03/08
Re: Plan for grep [bug-grep], Charles Levert, 2005/03/08
Re: Plan for grep [bug-grep], Tim Waugh, 2005/03/08
Re: Plan for grep [bug-grep],
Charles Levert <=
Re: Plan for grep [bug-grep], Charles Levert, 2005/03/08
Re: Plan for grep [bug-grep], Tim Waugh, 2005/03/08
Re: Plan for grep [bug-grep], Charles Levert, 2005/03/08
Re: Plan for grep [bug-grep], Tim Waugh, 2005/03/09