|
From: | Paul Crawford |
Subject: | Re: [Pan-users] Annoying ' in posts |
Date: | Sun, 23 Sep 2012 13:35:06 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 |
On 23/09/12 05:03, Steven D'Aprano wrote:
On 23/09/12 04:29, Paul Crawford wrote:What I hate about unicode was the idea of adopting 16-bit characters and thus breaking so much byte-orientated code that was written, tested, and integrated over the history of computing.You make it sound like the Unicode Consortium hacked into people's computers and changed their existing 8-bit ASCII files into 16-bit UCS-2 files. I'm pretty sure that never happened.
The point I was hoping to make was not to denigrate the desirability of a single universal character set, but about the specific idea of USC-2 representation.
For example, it is (was?) the case that if you wanted to properly use multi-language support on Windows NT (and later) you had to re-write any application to make use of 16-bit 'wide' character strings, thus breaking anything written in the past that assumed byte-orientated text.
And that is a *lot* of useful stuff that we are talking about: libraries, applications, storage devices, file compression utilities, etc.
Now you may have a point that the use of byte-orientated and NUL-terminated strings as developed for C/UNIX was possibly short-sighted, but in the context of 1960s/70s computing it was reasonable, quite possibly necessary, to be usably fast on the hardware of the day.
USC-2 breaks that by going 16-bit wide with NUL upper bytes in most common cases, and it requires a byte-order marker to cope with differing CPU architectures. Both should have been obvious at the time, so I don't know why it was adopted in that form.
UTF-8 on the other had allows a universal character set (and one much bigger than UCS-2) *and* it works with legacy code that relies on byte-represented text with NUL string terminators and all of the corresponding stuff built around that.
Regards, Paul
[Prev in Thread] | Current Thread | [Next in Thread] |