[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bites assumed set mid UTF-8
From: |
Dan Jacobson |
Subject: |
bites assumed set mid UTF-8 |
Date: |
Tue, 07 Mar 2006 02:15:10 +0800 |
Bad bad bad. Emacs 21.4.1 shows the same Chinese character ("nuclear")
even though the second bit string is not valid UTF-8. Cc'd Handa.
11100110 10100000 10111000
11100110 00100000 10111000
M-x compile this makefile
B=perl -wle '$$_=unpack "B*", <>; s/.{8}/$$& /g; print'
Q=qp-decode
a:
for i in =E6=A0=B8 =E6\ =B8; do \
echo $$i|$Q; echo -n $$i|$Q|$B; done
#echo =E6=A0=B8|qp-decode #Chinese "nuclear"
#echo =E6 =B8 |qp-decode #Not legal Unicode
uxterm etc. correctly displays only the first.
Save what you see into a file and indeed they are both the first.
Reality hits when one first saves it into a file bypassing emacs
display, and then find-files it. Thereupon the second version is not
coerced into being the same as the first, and emacs guesses the file
is latin-1.
- bites assumed set mid UTF-8,
Dan Jacobson <=