[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: python-shell-send-region uses wrong encoding?

From: Ernest Adrogué
Subject: Re: python-shell-send-region uses wrong encoding?
Date: Tue, 29 Oct 2013 17:34:26 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

29-10-2013, 16:34 (+0100); Peter Dyballa escriu:
> Am 29.10.2013 um 15:55 schrieb Ernest Adrogué:
> > Here it's different, print(b) prints `Wörterbuch' (C-c C-r) and
> > `Wörterbuch' (C-c C-c).
> This obviously happens in an 8-bit environment. `Wörterbuch' is the
> sequence of octets that represent the ISO Latin-x (or ISO 8859) encoded
> word `Wörterbuch' in UTF-8 encoding. Here the "ö" is encoded as two
> octets: 0xC3 0xB6. The first one is in ISO 8859-15 the character "Ä" and
> the latter is in that encoding the character "¶".
> So it seems that one functions prints exclusively in UTF-8…

The "ö" character is stored in the file as 0xC3 0xB6. As you say, this is
the UTF-8 encoding for this character.

The Python interpreter interprets the 2-byte sequence correctly.  This can
be seen in a number of ways: if I run the script in a terminal, or if I
paste or yank the line into Python shell buffer, or I do
python-shell-send-buffer, in all these cases the sequence is converted into
0xF6, which is the UTF-16 encoding for "ö" that Python uses internally, as
the output from repr() shows..

However, when the bytes are sent with python-shell-send-region, the
interpeter thinks that 0xC3 0xB6 are 2 characters, which is wrong.  In light
of this, I would say that there is a bug in python-shell-send-region.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]