bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uniq i18n implementation


From: Pádraig Brady
Subject: Re: uniq i18n implementation
Date: Mon, 14 Aug 2006 10:18:37 +0100
User-agent: Mozilla Thunderbird 1.0.8 (X11/20060502)

Pádraig Brady wrote:
> Paul Eggert wrote:
> 
>>>>>Using strcoll is inefficient anyway
>>>>
>>>>Don't we know it!  If we can avoid it, we'd like to.
>>>
>>>Well, the mbstowcs+wcscoll solution I presented
>>>should be equivalent to strcoll on any platform,
>>>and it's much faster in my tests.
>>
>>
>>That's good to know, though I'm puzzled as to why it's true.  For a
>>single comparison, can't strcoll typically return an answer without
>>examining all the input, and wouldn't that be faster than
>>mbstowc+wcscoll?
>>
>>But if it is true, perhaps we should rewrite memcoll to use the
>>mbstowc+wcscoll combination as well.
> 
> 
> I missed out a test case in my performance runs
> for same length lines with random data
> (where strcoll can break out early).
> I'll run that and comment more.

1 = my test uniq prog
2 = coreutils 5.97 uniq

a = ascii long lines, with all same length (85 chars), and 26 identical lines 
for every 27
b = ascii long lines, with all same length (85 chars), and all adjacent lines 
different

LANG=en_IE.UTF8

\  1       2
 ---------------
a| 0.466   5.300
b| 0.447   0.438

There seems to be serious overhead with strcoll on glibc-2.3.5-10 at least.

Pádraig.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]