[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nmh-workers] Troublesome messages
From: |
Ralph Corderoy |
Subject: |
Re: [Nmh-workers] Troublesome messages |
Date: |
Sun, 15 Oct 2017 01:05:56 +0100 |
Hi Jon,
> > > Don't know if there's anything that can be done about this given
> > > the nature of unicode and all, but I've been getting a lot of spam
> > > recently that looks like this:
> > >
> > > 代开发票。点数优惠。可验证后付款。13651402207叶先生 微信一致
>
> Not saying that it's not unicode, just that it makes a mess of my
> window. Using mate-terminal on linux, utf-8 local.
...
> Screws up the display after message 5.
I poked about that email a bit on this UTF-8 xfce4-terminal.
$ scan -width 0 -forma '%{from}\n%{subject}\n%{body}' .
=?GB2312?B?wdbPyMn6?= <address@hidden>
=?GB2312?B?tPq/qreixrE=?=
??Ʊ???????Żݡ?????֤?13651402207Ҷ???? ???һ??
$
The `%{body}' output is nmh trying to take the GB2312 body as UTF-8,
struggling with many of the bytes, producing a `?' for them instead, but
some GB2312 bytes do happen to form a valid UTF-8 sequence so the odd
`Ʊ' gets invented.
$ scan -width 0 -forma '%(decode{from})\n%(decode{subject})' .
林先生 <address@hidden>
代开发票
$
`%(decode)' works.
$ mhstore -outfile -
������Ʊ�������Żݡ�����֤�13651402207Ҷ���� ��һ��
storing message 5 to stdout
$
This time, nmh gets out the way and just flings the bytes at the TTY.
xfce4-terminal spots they're not valid and its U+FFFD `�' results; `Ʊ'
is still there.
$ mhstore -outfile - | iconv -f gb2312
storing message 5 to stdout
代开发票。点数优惠。可验证后付款。13651402207叶先生 微信一致
$
It's valid GB2312 according to iconv(1) that's converted it to UTF-8.
uniq(1) says that's identical to the line you give above.
$ mhshow | sed '$! d'
代开发票。点数优惠。可验证后付款。13651402207叶先生 微信一致
$
And that's the same line again, so nmh can do it too.
I think historically there's been various problems with sbr/fmt_scan.c,
e.g. its cpstripped(), and that could have included putting out partial
UTF-8, I don't recall. You could capture the bytes from the scan that
messes up and send them here. I've been using
$ scan -version
scan -- nmh-1.7-RC3 1.7-RC3-4-g3dfc049a built 2017-09-26 14:24:31 +0000 on
orac
Also, try xterm instead. I find it handy when another terminal's
quality is in doubt.
--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy