Re: [Nmh-workers] More robust header parsing...? Yahoo groups problems.

From: Ralph Corderoy
Subject: Re: [Nmh-workers] More robust header parsing...? Yahoo groups problems. Header dump and mod utilities... (Resent with attachment.)
Date: Sat, 22 Jun 2013 12:00:32 +0100

Hi Doug,

> I found a utility called "grabyahoogroup" on SourceForge and sucked
> all the messages from a group into a folder in my nmh directory. (The
> regular expressions needed a bit of tweaking, but I got it on the
> third try and messages started showing up.) So far, so good.

Unless the tweaks still aren't quite right and are stripping the leading
whitespace from the continued headers.  ;-)

> However, Yahoo seems to strip the whitespace from the front of header
> continuation lines, and nmh doesn't handle that properly.

nmh can't handle that;  it's invalid input and not generally resolvable
as to what was meant.  The original

    X-Mailer: Foo, Version: 3.14

could be corrupted to

    X-Mailer: Foo,
    Version: 3.14

and that's perfectly valid input of two headers.

> # Header keywords start with a capital letter and end with a colon
> headerfield = re.compile('^[A-Z][A-Za-z_-]*?:')

Just to note they don't have to have a capital letter at the start.

I think this is a fairly unusual problem that needs an ad hoc solution
each time to cope with the peculiarities of each case.

Cheers, Ralph.

