[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] I like neither green eggs and ham nor MIME

From: Ken Hornstein
Subject: Re: [Nmh-workers] I like neither green eggs and ham nor MIME
Date: Fri, 18 Jul 2014 09:54:03 -0400

>I am not at all secure about how the standard GNU utilities will handle
>non-ascii characters. For example, 'wc -c', just counts bytes. True,
>the man page talks about bytes, not characters, but I am still left
>uncomfortable.  Then there are the dozens of bash, python, and perl
>scripts that I have accumulated over the years.

My experience has been that a modern system handles 8-bit characters just

Now, where things get a little tricky is with multibyte character sets
like UTF-8.  Not everyone has broken from the paradigm that 1 byte == 1
character, like you noted (we had to do a bunch of work in the format
engine to fix that).  But since UTF-8 has the excellent property that
non-ASCII characters look like just 8-bit characters but won't ever
be mistaken for ASCII (not a surprise, since it was designed by two
of the original Unix geeks) I haven't come across a program where it
truely breaks.  I don't write in Python, but Perl support for UTF-8 is
excellent and I would be shocked if the situation for Python wasn't the

I jumped whole-hog into UTF-8 a few years ago, and I haven't regretted
it one bit.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]