bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new modules for Unicode normalization


From: Pádraig Brady
Subject: Re: new modules for Unicode normalization
Date: Sun, 22 Feb 2009 00:44:42 +0000
User-agent: Thunderbird 2.0.0.6 (X11/20071008)

Jim Meyering wrote:
> Bruno Haible wrote:
> ...
>> With this, you can easily create a program that reads UTF-8 from stdin and
>> outputs it as canonicalized UTF-8 on stdout:
>>   - create a "stream" that takes a Unicode character and outputs it to
>>     stdout. (Gnulib module 'unistr/u8-uctomb'.)
>>   - Wrap a Unicode normalizing filter around it. (Gnulib module
>>     'uninorm/filter'.)
>>   - Feed it with Unicode characters from standard input. (Gnulib module
>>     unistr/u8-mbtouc'.)
>>
>> I would love to see such a program in coreutils. But I am not a coreutils
>> maintainer.
> 
> Hi Bruno,
> 
> That sounds like it'd make a fine addition, and you're welcome to
> contribute it.  Anyone can contribute, assuming they assign copyright.
> And you did that for coreutils back before it was called that ;-)

It might be an idea for me to do it, since I know the details
of adding new programs to coreutils, and also I need to get
to know the unicode APIs in gnulib for further i18n work in coreutils.
I've not had much time for anything lately, but I would hope
to do that next week if possible.

So I'm wondering now why normalization functionality isn't in iconv?
Seems like a big ommision to me. There is a mention of it here:
http://www.archivum.info/address@hidden/2006-08/msg00004.html

Then I also noticed `uconv` which is in the "icu" package of fedora at least.
To normalize text the following worked for me:
  uconv -x NFC < test.utf8

So iconv may get this in future and uconv already has it.
Do we really need another util in coreutils for this?

cheers,
Pádraig.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]