[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mhfixmsg character set conversion
From: |
Ralph Corderoy |
Subject: |
Re: mhfixmsg character set conversion |
Date: |
Thu, 10 Feb 2022 11:19:35 +0000 |
Hi Steven,
> > I expect the bad file has something earlier on which fixes vim's
> > idea of the encoding to ISO 8859-1
>
> That does seem to be the case. Do you have any idea what kind of
> thing that might be? (I know you can't diagnose a file you haven't
> seen, but in general, what sorts of things should I look for?)
Non-ASCII bytes from the start of the file. I assume vim(1) will read
up to a certain amount until it either makes up its mind or assumes the
default.
Try this to remove the boring ASCII bytes and see what's left.
tr -d ' -~' <bad | env LC_ALL=C grep -n .
> > > $ grep -n ^Veuillez good | cut -c1-68
> > > 108:Veuillez ne pas répondre au présent courriel. Il a été gén�
...
> > (The ‘�’ at the end is to be expected.)
...
> Until now, I've only ever seen that glyph when a character doesn't
> exist in the font being used
No, it's not related to a Unicode code point not being in the font, or
only historically.
https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character
describes ‘�’ and it's being seen above because cut(1) is cutting bytes
and the ‘108:’ at the start of the line has shifted the 68/69 cut-off
point to part-way through the UTF-8 for a single code point AKA rune.
> $ setenv LC_ALL C
> $ perl -lpe 's/[^ -~]/sprintf "<%02x>", ord($&)/ge' good_snippet
> Veuillez ne pas r<c3><a9>pondre au pr<c3><a9>sent courriel. Il a
> <c3><a9>t<c3><a9> g<c3><a9>n<c3><a9>r<c3><a9>
Good.
> As expected, this returned pretty much instantly. Then I tried this:
>
> $ sh
> $ LC_ALL=C
> $ echo $LC_ALL
> C
> $ perl -lpe 's/[^ -~]/sprintf "<%02x>", ord($&)/ge' good_snippet
That's setting a local shell variable LC_ALL unless LC_ALL already
exists in the environment, and it probably doesn't. Try
sh
LC_ALL=C; export LC_ALL
locale
perl -lpe 's/[^ -~]/sprintf "<%02x>", ord($&)/ge' good_snippet
> Which in a way is good, because at least it means bash is behaving
> consistently.
Beware that invoking bash(1) as ‘sh’ is not the same as running ‘bash’.
Might not make a difference in this case, but in general it's better to
run whichever is desired.
> I propose to forget this particular clupea harengus of the crimson
> variety unless you find it interesting in and of itself.
It is odd. And odd might affect other things, including to do with nmh.
:-)
--
Cheers, Ralph.
- Re: mhfixmsg character set conversion, (continued)
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/07
- Re: mhfixmsg character set conversion, David Levine, 2022/02/07
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/08
- Re: mhfixmsg character set conversion, Ralph Corderoy, 2022/02/08
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/09
- Re: mhfixmsg character set conversion, Ralph Corderoy, 2022/02/09
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/09
- Re: mhfixmsg character set conversion, George Michaelson, 2022/02/09
- Re: mhfixmsg character set conversion, George Michaelson, 2022/02/09
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/09
- Re: mhfixmsg character set conversion,
Ralph Corderoy <=
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/11
- Re: mhfixmsg character set conversion, Robert Elz, 2022/02/11
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/11
- Re: mhfixmsg character set conversion, Robert Elz, 2022/02/11
- Re: mhfixmsg character set conversion, Ralph Corderoy, 2022/02/12
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/12
- Re: mhfixmsg character set conversion, Ralph Corderoy, 2022/02/12
- Re: mhfixmsg character set conversion, Ralph Corderoy, 2022/02/09
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/09
- Re: mhfixmsg character set conversion, David Levine, 2022/02/08