[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Nmh-workers] Only outputting "valid" characters

From: Ken Hornstein
Subject: [Nmh-workers] Only outputting "valid" characters
Date: Wed, 09 Jul 2014 11:46:33 -0400

We've got a long-standing bug report here:


It's hard to solve this easily, since we are now actually handling 8-bit
characters, but other things have recently occured that make me revisit it.

Mikhail brought up a badly-formatted message here:


which was the fault of someone else, but it made me think of a number of
issues.  First off, it's really impossible for us to deal with this perfectly;
trying to guess that something is UTF-8 and handle it appropriately would
just lead to madness.

It seems like there are two core issues:

- Handle the case of invalid character set conversion properly (basically,
  substitute a character when you come across something that cannot be
  converted).  I assume this is non-controversial.  It's also pretty
  straightforward to implement.

- Don't output characters that are not valid in the target character set.
  Now, some people suggest that we assume that 8-bit characters should be
  in a particular configurable character set.  I'm not a fan of that solution,
  as a) it's inevitably going to be wrong some of the time, and b) because
  of a) you still need to deal with not outputting invalid characters.
  Fixing this would also solve the problem mentioned in the first bug report.

  The problem is here is that I'm not sure _how_ to solve this problem.
  I am unsure if there is a standards-base API that lets us detect invalid
  characters (I'm not interested in something like Recode for this).  I
  wonder if mbtowc() and friends would throw errors if they encounter an
  invalid character in the current locale.  More investigation is needed.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]