[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23595: 25.1.50; file with chinese/japanse chars, vc-diff fails (HG,

From: Dmitry Gutov
Subject: bug#23595: 25.1.50; file with chinese/japanse chars, vc-diff fails (HG, Git, RCS)
Date: Tue, 24 May 2016 00:02:36 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1

On 05/23/2016 07:48 PM, Eli Zaretskii wrote:

The resulting diff contains either rubbish or fails to run.
Files attached.

I don't see any rubbish in the Git output.

Might that have to do something with your OS? I see the mojibake like others.

Setting coding-system-for-read is correct, because the important use
case is when the diffs are actually output.  The problem is that
UTF-16 is not ASCII-compatible, and so text output by Git itself will
be mishandled.  Another problem is that Git doesn't show the diffs at

Apparently so.

Which is weird, considering both vc-diff-internal and vc-coding-system-for-diff 
have both been virtually untouched for the last couple of years.

Not sure what do you see as weird.

That we have a regression while the relevant functions didn't change. Something probably changed on the lower level, and we might be wise to figure out what (unless somebody already knows, and just didn't point that out because it's not a bug).

But even if we figure out why happens, you (Uwe) probably want Git, Hg, etc, to 
treat this file as text, and not binary. Only then you'll be able to get 
meaningful diffs. I don't have a specific advice on that.

Why can't we invoke "git diff --text"?  That should fix the second
problem, I think.

It does not. It forces Git to diff the file as text, but neither the current code, nor the patch at the end make the displayed file contents to be correctly decoded.

I haven't tried Paul's solution for this myself, but it seems to be the way to go.

As for the first problem, we should probably refrain from binding
coding-system-for-read to a CODING-SYSTEM for which

   (coding-system-get CODING-SYSTEM :ascii-compatible-p)

returns nil.  We should instead bind it to no-conversion and decode
the file data parts by hand, skipping the parts that Git itself
outputs (yes, this is messy).  Patches to that effect are welcome.

Not sure what's the best place to do it, but the patch below gives me 24.5's behavior (correctly decoding the short "Binary files ... differ" output). Could someone try it together with Paul's solution?

diff --git a/lisp/vc/vc.el b/lisp/vc/vc.el
index 25b41e3..b62b68d 100644
--- a/lisp/vc/vc.el
+++ b/lisp/vc/vc.el
@@ -1696,6 +1696,8 @@ vc-diff-internal
        (setq coding-system-for-read
              (coding-system-change-eol-conversion coding-system-for-read
+    (unless (coding-system-get coding-system-for-read :ascii-compatible-p)
+      (setq coding-system-for-read nil))
     (vc-setup-buffer buffer)
     (message "%s" (car messages))
     ;; Many backends don't handle well the case of a file that has been

reply via email to

[Prev in Thread] Current Thread [Next in Thread]