[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] some indexing results

From: Paul Vixie
Subject: Re: [Nmh-workers] some indexing results
Date: Mon, 07 Feb 2011 17:13:21 +0000

> From: address@hidden
> Date: Mon, 07 Feb 2011 09:01:15 -0500
> On Mon, 07 Feb 2011 08:54:13 GMT, Peter Maydell said:
> > Dunno if 100 chars will be enough if and when we finally add
> > enough MIME support for scan to do something sensible with
> > MIME-encoded bodies (ie print the start of the text/plain bit).
> Keep in mind that there isn't any requirement that the
> first bodypart be a text/plain.  It's often a text/html and you
> need to go scanning another 2-3K into the <censored> thing
> before you find stuff that's not markup.  Exchange in particular
> seems enamored of sending 10-12K of inline CSS for a 4-5 word
> message.

my theory on this is, MH's mime awareness is better than it was but
still nowhere near good enough.  for example:

1. mhshow should not exist, we should merge its functionality into show.
2. mhstore should not exist, we should merge its functionality into burst.
3. text/plain with long lines should be word wrapped (via "fmt" or similar.)

i'm not currently scheduling time to work on those things, but if i do end
up integrating the index stuff such that i have to rototill the internal
interfaces used by scan and pick to be more opaque and less "FILE *" based,
i'll make every effort to make it possible to parse the mime to find the
"first few words" needed by scan, and once i have that logic, i'll use it
when building the index so that no mime decoding will have to be done on
the output from the database.  not because i worry about the processing time
but because storing an extra couple hundred or thousand bytes of boilerplate
mime headers per message would really hurt the disk cache locality and blow
out the size of the sleepycat *.db files.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]