[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#25288: 25.1; term, ansi-term, broken output of utf8 text
From: |
npostavs |
Subject: |
bug#25288: 25.1; term, ansi-term, broken output of utf8 text |
Date: |
Wed, 28 Dec 2016 14:10:30 -0500 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) |
found 25288 24.5
tags 25288 confirmed
quit
Vjacheslav <fvamail@gmail.com> writes:
> Trying to use this command from terminal running bash:
>
> [fva@localhost ~]$ python -c 'print "ш"*5000'
>
> produces garbage (шшш\321\210шшш) in output. Terminal needs
> reset. Possibly this is a bug which seen in very old linux, (breaks
> multibyte characters on buffer borders).
>
> default-process-coding-system is OK:
>
> default-process-coding-system is a variable defined in ‘C source code’.
> Its value is (utf-8-unix . utf-8-unix)
It looks like the problem is that the process filter function,
term-emulate-terminal, receives the output in chunks of 4096 bytes[1]. The
ш character is encoded in 2 bytes, which means it can be split across
chunks.
Is there a way to recognize incomplete decoding from lisp? I can't see
any.
[1]: It's getting bytes rather than characters because in term-exec-1 we
have:
;; The process's output contains not just chars but also binary
;; escape codes, so we need to see the raw output. We will have to
;; do the decoding by hand on the parts that are made of chars.
(coding-system-for-read 'binary))