nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mhfixmsg character set conversion


From: Ralph Corderoy
Subject: Re: mhfixmsg character set conversion
Date: Thu, 10 Feb 2022 11:19:35 +0000

Hi Steven,

> > I expect the bad file has something earlier on which fixes vim's
> > idea of the encoding to ISO 8859-1
>
> That does seem to be the case.  Do you have any idea what kind of
> thing that might be?  (I know you can't diagnose a file you haven't
> seen, but in general, what sorts of things should I look for?)

Non-ASCII bytes from the start of the file.  I assume vim(1) will read
up to a certain amount until it either makes up its mind or assumes the
default.

Try this to remove the boring ASCII bytes and see what's left.

    tr -d ' -~' <bad | env LC_ALL=C grep -n .

> > >    $ grep -n ^Veuillez good | cut -c1-68
> > >    108:Veuillez ne pas répondre au présent courriel. Il a été gén�
...
> > (The ‘�’ at the end is to be expected.)
...
> Until now, I've only ever seen that glyph when a character doesn't
> exist in the font being used

No, it's not related to a Unicode code point not being in the font, or
only historically.
https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character
describes ‘�’ and it's being seen above because cut(1) is cutting bytes
and the ‘108:’ at the start of the line has shifted the 68/69 cut-off
point to part-way through the UTF-8 for a single code point AKA rune.

>    $ setenv LC_ALL C
>    $ perl -lpe 's/[^ -~]/sprintf "<%02x>", ord($&)/ge' good_snippet
>    Veuillez ne pas r<c3><a9>pondre au pr<c3><a9>sent courriel. Il a 
> <c3><a9>t<c3><a9> g<c3><a9>n<c3><a9>r<c3><a9>

Good.

> As expected, this returned pretty much instantly.  Then I tried this:
>
>    $ sh
>    $ LC_ALL=C
>    $ echo $LC_ALL
>    C
>    $ perl -lpe 's/[^ -~]/sprintf "<%02x>", ord($&)/ge' good_snippet

That's setting a local shell variable LC_ALL unless LC_ALL already
exists in the environment, and it probably doesn't.  Try

    sh
    LC_ALL=C; export LC_ALL
    locale
    perl -lpe 's/[^ -~]/sprintf "<%02x>", ord($&)/ge' good_snippet

> Which in a way is good, because at least it means bash is behaving
> consistently.

Beware that invoking bash(1) as ‘sh’ is not the same as running ‘bash’.
Might not make a difference in this case, but in general it's better to
run whichever is desired.

> I propose to forget this particular clupea harengus of the crimson
> variety unless you find it interesting in and of itself.

It is odd.  And odd might affect other things, including to do with nmh.
:-)

-- 
Cheers, Ralph.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]