[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] Thoughts: header/address parsing

From: norm
Subject: Re: [Nmh-workers] Thoughts: header/address parsing
Date: Sun, 03 Aug 2014 12:01:01 -0700

Ken Hornstein <address@hidden> writes:
>Again, more technical details here.
>Address parsing in nmh is kind of a mess.  We still support RFC 733 syntax
>"address at host", UUCP stuff, source routing ... a bunch of stuff.  This
>should be fixed.
>m_getfld() is the handler for generically parsing the headers of an email
>message.  Everyone agrees that it pretty much sucks and is overused.
>Thankfully the worst part of it (peeking inside of stdio internals) has
>been fixed; thanks, David!
>I've been thinking about biting the bullet and simply writing a header
>parser in flex/bison (I'm assuming flex/bison because those have
>features that make this a lot easier to implement; you don't need
>either to build from a distribution, because Automake keeps around
>the generated C files for the distribution tar file).  But practical
>concerns rear their ugly heads again; for one, error recovery is kind of
>complicated.  But it occurs to me that maybe I'm trying to bite off more
>than I can chew, and maybe I should try breaking this down a bit.  It
>occurs to me that there are really five distinct grammers that we should
>think about:
>- Parsing a sequence of message headers.  This is really what m_getfld()
>does now.  This grammar could be pretty simple.  We could use this to
>stuff headers inside of the "new" message API, discussed previously.
>The headers wouldn't be interpreted yet.
>- Parsing an address header.  This is by far the most complicated part
>of the parser, but I think just taking the RFC 5322 ABNF and translating
>it into a bison grammar shouldn't be too bad.
>- Parsing a date header.  We have a lex parser that does this now; it occurs
>to me that it should really be a bison grammar, but whatever.  Solvable
>- Parsing a MIME header/param list.  Right now the parser for this is awful;
>and I say that as someone who had to add support for parsing out the
>RFC 2231 parameter extensions.  I'm not so crazy about blowing all of
>that work up, but you know what?  I think it would just be easier
>in the long run to deal with it if it was based on bison.
>- Parsing a mhbuild directive.  These are kind of like a MIME header, but not
>exactly.  The grammer for this is actually pretty weird and picky.  Right
>now it's overloaded on the MIME header parser, but it occurs to be that
>there's no reason that should be the case.
>The other headers ... well, I guess I don't see a reason why we need to parse
>them.  If the message-id header doesn't match the RFC 5322 syntax, should
>we care?  I say no.
>Modern flex/bison implementations can handle multiple parsers in one
>program, so that's not an issue.  This would also let us get rid of the
>horrible fixed buffer sizes we have now.
>Thoughts?  Completely open to ideas here.  I remember people saying that
>they had a list of messages that nmh dealt poorly with; it would be nice
>to try those out against a hypothetically-new nmh parser.

I wondering, if in doing this, you might consider a new nmh command that would
parse message headers. I suppose that there a dozens of scripts out that there
do some of this. I'm guessing that they are mostly all ad hoc, and buggy.

    Norman Shapiro

reply via email to

[Prev in Thread] Current Thread [Next in Thread]