nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] nmh internals: full MIME integration


From: Ralph Corderoy
Subject: Re: [Nmh-workers] nmh internals: full MIME integration
Date: Sun, 27 Jul 2014 11:08:01 +0100

Hi Ken,

> > > Okay, I guess I could see that.  The normal case would be to
> > > decode the contents completely
> >
> > Yep, to UTF-8 single lines?
> 
> Well, to whatever the local character set is.

Ah, OK, my natural inclination is UTF-8 everywhere and convert on I/O,
but we've obviously got a backlog of code to consider.  If the new
header handler is "to local character set", e.g. US-ASCII, then how does
replying to an email with a =?utf-8? subject work?  Does it suffer
lossage as it's ASCII'd before an inferior version reaches the `Subject:
Re:' producer?

> > Well, you might be thinking the 2047-decoding might not make a lot
> > of difference, whereas I'm thinking a block can be read into a
> > page-aligned buffer that has an \n beyond it as a sentinel, then
> > check for /foo[ \t]*:/i, ignore any non-foo headers, hunt for the
> > next \n and repeat if it's not the sentinel, else read another block
> > and try again.  Stop if no more blocks or \n\n.  The detail's a bit
> > more complex but there's no allocation and copying for headers seen
> > along the way;  they'll be found when they're looked for in turn.
> > The file's blocks aren't being modified so no copy-on-write's
> > occurring.
> 
> Sigh.  I wasn't actually thinking of special-casing pick.

Neither was I.  :-)  Most programs that want headers don't want all
headers?  Some want relatively few out of the many that are stuffed in
there nowadays.  It's a bit hard to think of ones that do want them all
with the normal components file?

> (As an aside, I see that pick does use ^foo[ \t]*: to match on a header,
> but my reading of RFC 5322 is that spaces are not allowed between the
> header name and the colon ... but I guess the old syntax did?)

I know other code makes allowances for them, e.g.
http://golang.org/src/pkg/net/textproto/reader.go?s=11934:11987#L475
http://cpansearch.perl.org/src/MARKOV/Mail-Box-2.115/lib/Mail/Box/Parser/Perl.pm
OTOH some does not, e.g.
http://hg.python.org/cpython/file/bffa0b8a16e8/Lib/email/feedparser.py#l33
http://cpansearch.perl.org/src/RJBS/Email-Simple-2.203/lib/Email/Simple/Header.pm
Perl straddles the fence.  :-)

I've just had a pooter through folders here looking for them with

    LC_ALL=C awk '/^$/ {nextfile} /^[!-9;-~]+[^!-~]/ {print FILENAME ":" $0}' 
[1-9]*

and all it turned up where 69 "From " lines at the start of some of the
older emails I have.  (This surprised me, but if a bit off-topic.)

So I vote to drop support for these kind of invalid headers unless
anyone here has some that show they're common?

Cheers, Ralph.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]