bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: horrible utf-8 performace in wc


From: Bo Borgerson
Subject: Re: horrible utf-8 performace in wc
Date: Wed, 07 May 2008 07:41:24 -0400
User-agent: Thunderbird 2.0.0.12 (X11/20080227)

Pádraig Brady wrote:
> canonically équivalent
> canonically équivalent
> 
> Pádraig.
> 
> p.s. I Notice that gnome-terminal still doesn't handle
> combining characters correctly, and my mail client thunderbird
> is putting the accent on the q rather than the e, sigh.

They both render correctly here (Thunderbird 2.0.0.12).

Is there a good library for combining-character canonicalization
available?  That seems like something that would be useful to have in a
lot of text-processing tools.  Also, for Unicode, something to shuffle
between the normalization forms might be helpful for comparisons.

I may be misinterpreting your patch, but it seems to me that
decrementing count for zero-width characters could potentially lead to
confusion.  Not all zero-width characters are combining characters, right?

Bo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]