Re: Plan for grep [bug-grep]

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Plan for grep [bug-grep]

From:	Tim Waugh
Subject:	Re: Plan for grep [bug-grep]
Date:	Tue, 8 Mar 2005 13:13:07 +0000
User-agent:	Mutt/1.4.2.1i

On Tue, Mar 08, 2005 at 05:38:36AM -0500, Charles Levert wrote:

> BTW, is the assumption (in the current code)
> that any two corresponding uppercase and
> lowercase Unicode code points have the same
> UTF-8 octet length (or 8-bit code unit lenght)
> always a safe (secure) one?

Where do you see that assumption?  Is that assumption also in the
Fedora Core patched grep?

> Since performance is an issue, measuring it could
> be included in testing, as well as reporting
> serious discrepancies between the results of
> identical tests being performed under various
> different locales.

As part of 'make check'?  If that's what you mean, better make sure
not to use wall-clock time to measure against but 'user' as reported
by time(1)!

> The only danger I see in waiting to do this is
> that there seems to have been improvements in
> UTF-8 handling by glibc's regex code.  Maybe all
> the -i kludges are not even needed anymore.
> Maybe there are also performance issues (either
> way) with this.
> 
> That's why I previously stated that I saw doing
> this as a priority:  other items are affected.

The undeniable improvements in the glibc regex code are very useful --
however, the current (unpatched) grep multibyte handling is flawed in
many more ways than you might guess, and *that* is the thing to fix
first when doing performance testing.  See
grep-2.5.1-egf-speedup.patch.

Tim.
*/

pgpw4cWZJ_lrR.pgp
Description: PGP signature

[Prev in Thread]

Current Thread

[Next in Thread]

Re: restructure patch for review [bug-grep], (continued)

Prev by Date: [bug-grep] [patch #3810] Red Hat's "i18n" patch
Next by Date: Re: Plan for grep [bug-grep]
Previous by thread: Re: Plan for grep [bug-grep]
Next by thread: Re: Plan for grep [bug-grep]
Index(es):
- Date
- Thread