pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Annoying ' in posts


From: Paul Crawford
Subject: Re: [Pan-users] Annoying ' in posts
Date: Sun, 23 Sep 2012 13:35:06 +0100
User-agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0

On 23/09/12 05:03, Steven D'Aprano wrote:
On 23/09/12 04:29, Paul Crawford wrote:

What I hate about unicode was the idea of adopting 16-bit characters and
thus breaking so much byte-orientated code that was written, tested, and
integrated over the history of computing.

You make it sound like the Unicode Consortium hacked into people's
computers
and changed their existing 8-bit ASCII files into 16-bit UCS-2 files. I'm
pretty sure that never happened.

The point I was hoping to make was not to denigrate the desirability of a single universal character set, but about the specific idea of USC-2 representation.

For example, it is (was?) the case that if you wanted to properly use multi-language support on Windows NT (and later) you had to re-write any application to make use of 16-bit 'wide' character strings, thus breaking anything written in the past that assumed byte-orientated text.

And that is a *lot* of useful stuff that we are talking about: libraries, applications, storage devices, file compression utilities, etc.

Now you may have a point that the use of byte-orientated and NUL-terminated strings as developed for C/UNIX was possibly short-sighted, but in the context of 1960s/70s computing it was reasonable, quite possibly necessary, to be usably fast on the hardware of the day.

USC-2 breaks that by going 16-bit wide with NUL upper bytes in most common cases, and it requires a byte-order marker to cope with differing CPU architectures. Both should have been obvious at the time, so I don't know why it was adopted in that form.

UTF-8 on the other had allows a universal character set (and one much bigger than UCS-2) *and* it works with legacy code that relies on byte-represented text with NUL string terminators and all of the corresponding stuff built around that.

Regards, Paul



reply via email to

[Prev in Thread] Current Thread [Next in Thread]