bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: horrible utf-8 performace in wc


From: Bo Borgerson
Subject: Re: horrible utf-8 performace in wc
Date: Wed, 07 May 2008 09:50:29 -0400
User-agent: Thunderbird 2.0.0.12 (X11/20080227)

Jim Meyering wrote:
> Bo Borgerson <address@hidden> wrote:
>> I may be misinterpreting your patch, but it seems to me that
>> decrementing count for zero-width characters could potentially lead to
>> confusion.  Not all zero-width characters are combining characters, right?
> 
> It looks ok to me, since there's an unconditional increment
> 
>                 chars++;
> 
> about 25 lines above, so the decrement would just undo that.


Right, I guess my question is more about the semantics of `wc -m'.
Should stand-alone zero-width characters such as the zero-width space be
counted?

The attached (UTF-8) file contains 3 characters according to HEAD, but
only two with the patch.

Bo
a​b

reply via email to

[Prev in Thread] Current Thread [Next in Thread]