[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Nmh-workers] Pessimal Optimizations.
From: |
Lyndon Nerenberg |
Subject: |
[Nmh-workers] Pessimal Optimizations. |
Date: |
Tue, 11 Dec 2012 17:25:18 -0800 |
On 2012-12-11, at 4:54 PM, Ralph Corderoy wrote:
> mmap seems fine if the program is
> accessing files under its sole control, perhaps not such a good fit for
> nmh.
For now, perhaps not an issue. But there has been some minor grumbling about
concurrent access issues on the list. If we do lean towards encouraging
concurrent access to things, mmap() can cause a lot of grief, in that it
presents the appearance of a single view of the data that doesn't necessarily
exist.
Back in the Esys days, when I was maintaining our version of the Cyrus IMAP
server, I had no end of grief dealing with mmap() coherency issues in relation
to, e.g., the globally shared master mailboxes file. Things may have improved
since then, but I doubt that anyone can truly implement a read-write
multi-process-coherent mmap() on top of UNIX using the existing API. And if
you do switch to mmap(), you must do it everywhere. I wouldn't even *think* of
mixing mmap() access with FD-based I/O on the same object, even on the systems
that claim to have full coherency between FS buffers and mmap() page views.
(They don't. I know this all too well ...)
Ultimately, though, mmap() is just a micro-optimization in the context of a
*822/MIME parser. The effort that would be expended on mmap() I/O would be
better spent on writing a bullet-proof parser. No matter what, you will end up
copying data to and from user space on the way to its eventual display. The
solution here is to write a good one-pass MIME parser that can collect the
structure of the message as it's read in to memory. In many cases, once you
hit the body you really can just parse as you go. Most MUAs want to start with
the first MIME part – almost always a text/ part. Picking that off is easy,
and leaves you pointed at the remainder, should you need it. If the message is
a multipart, you can continue the body scan, parsing out the MIME structure
along the way. As you build the in-memory representation of the message, you
store the file offsets of the starting points of each part. You can also
opportunistically cache some of the body parts along the way. At worst, you
pay the price of reading the entire message once, plus the overhead of
re-reading specific MIME parts the MUA requests later.
In the case of mmap(), you do the same thing, just pretending the entire
message is already in a memory buffer. But it isn't. You still pay the
penalty of reading the data from disk into the FS cache, and then exposing that
to the user process. These days it's debatable whether there is more overhead
just copying from the kernel into user space vs. all the mucking about with
page tables and the like that mmap() requires. TLB flushes are *expensive*,
even more so when you're running under a hypervisor. A straight copy from a
kernel buffer to user memory can often take place within the CPUs L1/L2 cache.
So before anyone claims mmap() is faster (not that they were), you really need
to sit down and benchmark how your particular CPU and OS perform.
But again, these really are micro-optimizations. mmap() won't make anything
run perceivably faster, but it will introduce the potential for many subtle
bugs. Best to leave well enough alone.
Now if you *really* want to speed things up, let's talk about lazy folder
indexes.
--lyndon
- Re: [Nmh-workers] m_getfld, (continued)
- Re: [Nmh-workers] m_getfld, Lyndon Nerenberg, 2012/12/10
- Re: [Nmh-workers] m_getfld, Ken Hornstein, 2012/12/10
- Re: [Nmh-workers] m_getfld, Paul Vixie, 2012/12/10
- Re: [Nmh-workers] m_getfld, Jon Steinhart, 2012/12/10
- Re: [Nmh-workers] m_getfld, Ken Hornstein, 2012/12/10
- Re: [Nmh-workers] m_getfld, Valdis . Kletnieks, 2012/12/14
- Re: [Nmh-workers] m_getfld, Paul Vixie, 2012/12/14
- Re: [Nmh-workers] m_getfld, Paul Vixie, 2012/12/10
- Re: [Nmh-workers] m_getfld, Ken Hornstein, 2012/12/10
Re: [Nmh-workers] m_getfld, Ralph Corderoy, 2012/12/11
Re: [Nmh-workers] Pessimal Optimizations., Ken Hornstein, 2012/12/11
Re: [Nmh-workers] Pessimal Optimizations., Lyndon Nerenberg, 2012/12/11
Re: [Nmh-workers] Pessimal Optimizations., Lyndon Nerenberg, 2012/12/11
Re: [Nmh-workers] m_getfld, David Levine, 2012/12/11