[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: grep is horriby slow in UTF-8 locales
From: |
Glenn Maynard |
Subject: |
Re: grep is horriby slow in UTF-8 locales |
Date: |
Sat, 8 Nov 2003 15:58:15 -0500 |
User-agent: |
Mutt/1.5.4i |
On Fri, Nov 07, 2003 at 04:49:58PM +0100, Danilo Segan wrote:
> This doesn't happen with:
>
> $ grep --version
> grep (GNU grep) 2.4.2
This was probably before full multibyte support was added to grep; the
issue here specifically only happens in multibyte encodings. (My grep
is slow in en_US.UTF-8, and fast in en_US.ISO-8859-1.) Try:
# echo tést | grep 't.st'
tést
# echo tést | grep 't[aé]st'
tést
> $ LC_ALL=POSIX time grep XYZ test.txt
> Command exited with non-zero status 1
> 0.04user 0.06system 0:00.10elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (118major+25minor)pagefaults 0swaps
>
> Last example shows that CPU usage is not really any kind of rule to
> base conculsions on (sr_CS.UTF-8 is my everyday locale, and I would
> really notice if grep had any problems with it).
The field you should be reading is "user". "CPU" is roughly
(user+system)/elapsed, and isn't very relevant here.
--
Glenn Maynard