[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h-e-w] Processing chars above \200

From: Eli Zaretskii
Subject: Re: [h-e-w] Processing chars above \200
Date: Sun, 23 Sep 2018 17:13:57 +0300

> Date: Sun, 23 Sep 2018 09:11:41 -0400
> From: John J. Xenakis <address@hidden>
> Cc: address@hidden
> You suggested that I use the "raw-text" coding system, implying that
> these characters are random binary data.  But they're actually
> completely valid 8-bit characters that are commonly used in Western
> media.

Then there's still something not right, because you shouldn't be
having any of these problems with files that are consistently encoded.

> So the net result is that emacs loads a Windows text file on a Windows
> system, decides that it's really a Unix file (which it isn't), and
> then really damages the file in a way that's almost impossible to
> recover from.  Eli, this is not something that an editor should be
> doing gratuituously.

It shouldn't and it doesn't.  Depending on what exactly is in your
files, something that is still a bit of a mystery for me, Emacs could
sometimes err if you don't tell it enough.  But in any case, there are
commands to fix those errors right away, as soon as you realize
something like that happens.  We will get to that, once I understand
more about the problem.

> So the ad-hoc workaround is this:
> * Open the file in Notepad.  All the 8-bit characters are displayed
>   correctly.
> * Select and copy the entire text in Notepad.
> * In emacs, open a new text file.
> * Paste the text that you copied from Notepad.
> * Save the result.
> Much to my relief, this cures all the 8-bit problems, and I can go
> back to reloading and editing the file in emacs.

Is it possible that the file is encoded in UTF-16 or UTF-8?  What
happens if you visit the file like this:

  C-x RET c utf-8 RET C-x C-f FILENAME RET

and similarly for utf-16?  Does this fix the problem?

And how were those files created in the first place?  I understood
from your previous explanations that you created those files by
copy-pasting from other applications, is that right?

> So I select the character é (e with an acute accent, as in the first
> letter of the French spelling of the word elite).  Here is the
> information that "C-x=" provides in each of the two cases, the damaged
> and repaired file respectively:
> Char: \351 (4194281, #o17777751, #x3fffe9, raw-byte) point=76501 of
> 343691 (22%) column=51
> Char: é (233, #o351, #xe9, file #xE9) point=74734 of 336596 (22%)
> column=51

Can you post one such file, please?  It is important that you post a
file as a binary attachment, and it is also important to verify that
the trick with Notepad and copy/paste works with the file you post.

I'm quite sure this is caused by something very simple, because
Notepad is certainly not smarter than Emacs wrt encodings.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]