nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mhfixmsg character set conversion


From: Ralph Corderoy
Subject: Re: mhfixmsg character set conversion
Date: Sat, 12 Feb 2022 11:51:21 +0000

Hi Steven,

> > I assume vim(1) will read up to a certain amount until it either
> > makes up its mind or assumes the default.
>
> That makes sense.

Yes, but I was wrong...

>         - Lines 85-110 are the text/plain portion, with
>
>              Content-Transfer-Encoding: 8bit
>              Content-Type: text/plain; charset="UTF-8"
>              Mime-Version: 1.0
>
>         - Lines 112-336 are the text/html portion, with
>
>              Content-Transfer-Encoding: 8bit
>              Content-Type: text/html; charset=iso-8859-1
>              Mime-Version: 1.0
>
> ...so it seems that tr is reporting exactly what we'd expect to see.

Agreed.

The file has UTF-8 and later ISO 8859-1.  vim(1)'s logic is to keep
trying to parse the bytes of the file as one encoding after another,
stopping at the first which is successful.  The list of encodings comes
from ‘:se fileencodings?’ which defaults to
‘ucs-bom,utf-8,default,latin1’ here.

There's no BOM so ucs-bom fails.  The ISO 8859-1 bytes don't happen to
be valid UTF-8.  ‘default’ means use your environment, which is probably
UTF-8 again; fails.  Which means we arrive at ‘latin1’, AKA ISO 8859-1,
which is happy.

> ...but in bash, although the line gets pasted, the newline at the end
> of it somehow doesn't.

Another difference is the pasted text is normally highlighted in some
way, e.g. inverse video, until it's committed with Enter.

-- 
Cheers, Ralph.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]