bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bites assumed set mid UTF-8


From: Dan Jacobson
Subject: bites assumed set mid UTF-8
Date: Tue, 07 Mar 2006 02:15:10 +0800

Bad bad bad. Emacs 21.4.1 shows the same Chinese character ("nuclear")
even though the second bit string is not valid UTF-8. Cc'd Handa.
11100110 10100000 10111000
11100110 00100000 10111000
M-x compile this makefile

B=perl -wle '$$_=unpack "B*", <>; s/.{8}/$$& /g; print'
Q=qp-decode
a:
        for i in =E6=A0=B8 =E6\ =B8; do \
        echo $$i|$Q; echo -n $$i|$Q|$B; done
#echo =E6=A0=B8|qp-decode #Chinese "nuclear"
#echo =E6 =B8  |qp-decode #Not legal Unicode

uxterm etc. correctly displays only the first.

Save what you see into a file and indeed they are both the first.
Reality hits when one first saves it into a file bypassing emacs
display, and then find-files it. Thereupon the second version is not
coerced into being the same as the first, and emacs guesses the file
is latin-1.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]