nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility


From: Tom Lane
Subject: Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Date: Mon, 17 Oct 2016 14:35:57 -0400

Ken Hornstein <address@hidden> writes:
>> Personally I'd love it if send did something like:
>> (1) if text is entirely 7-bit: specify charset=us-ascii
>> (2) if environment specifies a non-ascii character set, use that
>> (3) assume charset=utf-8 (maybe allow this to be overridden in profile)

> We already do (1) and (2).

OK.

> (3) is the problem.  Other people who have
> thoughts on this topic are free to weigh in.  Personally, I believe that
> if you're doing LANG=C, you shouldn't be dealing with any 8-bit characters
> at all.  Isn't that's what that means?

Well, whether you intentionally type any and whether some happen to creep
into your email are two different things.  As an example: I am suspicious
now that my problem really stemmed from exmh choosing to use both -push
and -forward; the latter is documented as "If -forward is given, then a
copy of the draft will be attached to this failure notice."  So I am
thinking that it stuck the UTF8-containing text onto the failure notice,
and then that send attempt failed for exactly the same reason, ie it was
rejected by the character set strictness check.  Even if you're right that
there was no send attempt at all, I'm expecting that once it's there
it will fail like this :-(

So basically the problem here is one of robustness.  Yeah, it would be
nice to be sure that what you are sending is 100% valid.  But I don't
really agree with the tradeoff that's been made of failing when you
can't be sure of that.  Especially since, if you think you know what
non-ASCII encoding a bit of text is in, you're just fooling yourself
anyway.  It's impossible to distinguish the ISO 8859 variants from
each other, and at best heuristic to tell whether text is in UTF-8
or an ISO 8859 variant.

Maybe we could just leave off the character set spec if it turns out to
be definitely wrong?

                        regards, tom lane



reply via email to

[Prev in Thread] Current Thread [Next in Thread]