[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mailutils

From: xystrus
Subject: Re: mailutils
Date: Mon, 11 Mar 2002 16:28:06 -0500
User-agent: Mutt/1.3.27i

I'd like to continue the conversation Alain and I have been
having here, so as to benefit from the insight of as many
people as possible.  :)

On  Sun, 10 Mar 2002, Alain Magloire wrote:

[Problems with UW-IMAP]
> > Performance isn't so great when you have a large
> > number of users with large MBOX mail folders.
> The problem then was not WU imap4d but rather the
> the mailbox format use to store mail.  It is
> notoriouly none that mbox i.e one file mailbox
> per user is not maybe the fastest/better way
> to handle mail, see MH or MailDir formats, 
> unfortunately we have to leave with it for a long while.

Well, we discussed this at length on the mutt users list.
Maildir is not without its own problems...  It takes
substantially longer to open and index a Maildir/MH folder
than it does to do the same for a MBOX folder.  This is,
in large part, because there is a lot more overhead with
many files (i.e. open->read->close->seek to next 
file/mail->repeat from beginning for each mail).  Of course,
this is WRT mutt's implementation of opening and indexing
folders of each type, and it's very possible that improvements
could be made to how they do things to speed up Maildir indexing.
I haven't really looked at this, and someone was working on
a patch to cache some of the relevant info to make it faster.
Of course, that still means slow indexing on first folder open...
However, as it stands now, many people did benchmarks with
large folders of both MBOX and Maildir format, and MBOX
beat Maildir in ALL cases.  Often by as much as 300% though
the actual difference was substantially impacted by what FS
the mail folders resided on.  EXT2/3 were among the slowest...
One of the modified BSD FSes performed reletively well, with
around a 10% difference in speed.  Also note that filesystem
cache was taken into account...  on first read, MBOX was MUCH
faster, and on subsequent indexes, it was still faster but not
nearly so much.

This kind of problem also shows up in sendmail, when it has
a lot of queued messages to be sent.  Ext2 and most 
filesystems on Unix-like systems don't handle huge numbers
of files very well.  You also have problems with running out
of inodes, etc.

Neither is perfect.  I actually prefer MBOX, because I'm
impatient, and I'd much rather the indexing of mailboxes took
the smaller amount of time.  I typically only expunge mail
folders when I'm done reading mail in them, unless I have a
LOT of unread messages in them.  I read mail religiously,
so this doesn't happen too often.

I think it's a matter of preference issue though...

I have some more thoughts about this too, which I'll get
into momentarily.

> > It seems as though it shouldn't
> > be necessary.  E.g. by mmap'ing the file and
> > moving/appending folder data, it seems temp files
> > should be completely avoidable.
> Appending mail is easy, but how about to
> remove message 501 and 60 in a mailbox that
> contains 2000 msgs. 

IIRC from reviewing IMAP4 before, it doesn't allow for this...
but what I had in mind for this is to expunge only on mailbox
close, and have the server fork a backround process to do this
so that the user can continue to deal with mail in a different
folder.  The only perceived performance problem then would be
if the system started thrashing due to multiple I/O on the same
device.  If the system were beefy enough and/or idle enough to
handle this kind of situation, the user could continue reading
mail in a different folder while a background process expunges
the mail from the previous folder, and not experience a noticable

Maybe what we really need is a new version of IMAP... ;-)

> What happen when something goes wrong?
> Say you ran out of disk space? or Quotas?

How does using Maildir help these problems?  I think they 
actually make it *worse*, as each message now takes up at 
least one full disk block, no matter how much data is in it.
The exception is if you're using a filesystem like ReiserFS
which is designed to deal with this by packing data from
multiple files into one disk block...

> >  This may have its own
> > problems though, requiring possibly all or a large
> > portion of the mailbox to be mapped into memory...
> For large sites, unless you have 1 Gig of RAM, it is
> not a viable solution.

Ok, this is where I get a bit fuzzy, not having done a lot with
MMIO.  Please correct me if I'm wrong...

RAM isn't so expensive, and I'd argue that anyone who has less
than a GB of RAM in their mail server is being a bit foolish (or
really, really strapped for cash, I suppose)...  But, your point
is well taken.  Still, I wonder how much of a difference there
really is between MMIO and writing out temp files when the system
has low memory.  In the MMIO case, you're going to page memory in
and out, whereas in the temp file case, you're doing all of your
I/O by re-writing the whole file.  In either case, you're still
doing a lot of disk I/O.  In the MMIO case, at least most of it
is read instead of write...

While this is a bit expensive in terms of memory, it only needs
to be done when one is EXPUNGING messages.  Copying messages from
another folder can still be done by appending the message to the
folder, and you certainly don't need to mmap the whole mailbox to
index it, though the I/O will be a bit faster... noticably so for
large folders.

> > I haven't had the chance to look at your mailbox
> > folder library yet, but I'm curious to see how you
> > handle adding and removing messages from mailfolders. 
> The it is handle is too complex for me, this is

Sorry, I'm not sure what you're saying here...  I think
you're saying that these mailbox operations are too
complex for you, is that correct?

> why we are planning a rewrite of the mailbox library
> for version 2, but we want to learn a little more from
> version 1.

Ok... what is it that you're trying to learn, and what is it
that you're interested in changing?

Thanks for your input.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]