[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: horrible utf-8 performace in wc
From: |
Bo Borgerson |
Subject: |
Re: horrible utf-8 performace in wc |
Date: |
Wed, 07 May 2008 09:50:29 -0400 |
User-agent: |
Thunderbird 2.0.0.12 (X11/20080227) |
Jim Meyering wrote:
> Bo Borgerson <address@hidden> wrote:
>> I may be misinterpreting your patch, but it seems to me that
>> decrementing count for zero-width characters could potentially lead to
>> confusion. Not all zero-width characters are combining characters, right?
>
> It looks ok to me, since there's an unconditional increment
>
> chars++;
>
> about 25 lines above, so the decrement would just undo that.
Right, I guess my question is more about the semantics of `wc -m'.
Should stand-alone zero-width characters such as the zero-width space be
counted?
The attached (UTF-8) file contains 3 characters according to HEAD, but
only two with the patch.
Bo
aâb
- horrible utf-8 performace in wc, Jan Engelhardt, 2008/05/06
- Re: horrible utf-8 performace in wc, Pádraig Brady, 2008/05/07
- Re: horrible utf-8 performace in wc, Bo Borgerson, 2008/05/07
- Re: horrible utf-8 performace in wc, Jim Meyering, 2008/05/07
- Re: horrible utf-8 performace in wc,
Bo Borgerson <=
- Re: horrible utf-8 performace in wc, Pádraig Brady, 2008/05/07
- Re: horrible utf-8 performace in wc, Bo Borgerson, 2008/05/07
- Re: horrible utf-8 performace in wc, Pádraig Brady, 2008/05/07
- Re: horrible utf-8 performace in wc, Bo Borgerson, 2008/05/08
- Re: horrible utf-8 performace in wc, Bruno Haible, 2008/05/08
- Re: horrible utf-8 performace in wc, Pádraig Brady, 2008/05/07
- Re: horrible utf-8 performace in wc, Bruno Haible, 2008/05/08
Re: horrible utf-8 performace in wc, Jan Engelhardt, 2008/05/07
Re: horrible utf-8 performace in wc, Jim Meyering, 2008/05/07