nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Nmh-workers] IMAP testing, again


From: Ken Hornstein
Subject: [Nmh-workers] IMAP testing, again
Date: Wed, 08 Nov 2017 22:32:48 -0500

So, I made some improvements to imaptest and I decided to really stress
this a bit.  I created a Gmail account and started uploading the Enron
corpus to it.  I didn't quite get it all (it looks like Gmail closed the
connection after 4 days or so), but it was almost all.  BTW, it turns out
it takes on average 0.66 seconds to append a message to a Gmail mailbox,
so you can imagine how long it took to get most of the corpus uploaded.
To make the test more extreme, I put everything in one mailbox.

Some things that popped out:

- Connection & authentication is very quick (I am using XOAUTH2):

Connect time: 0.014068 sec
TLS negotation time: 0.034776 sec
Authentication time: 0.111667 sec

- However, accessing the mailbox, not so much:

(tls-encrypted) => A2 SELECT "Enron"
(tls-decrypted) <= * FLAGS (\Answered \Flagged \Draft \Deleted \Seen 
$NotPhishing $Phishing fart)
(tls-decrypted) <= * OK [PERMANENTFLAGS (\Answered \Flagged \Draft \Deleted 
\Seen $NotPhishing $Phishing fart \*)] Flags permitted.
(tls-decrypted) <= * OK [UIDVALIDITY 3517] UIDs valid.
(tls-decrypted) <= * 480832 EXISTS
(tls-decrypted) <= * 0 RECENT
(tls-decrypted) <= * OK [UIDNEXT 480833] Predicted next UID.
(tls-decrypted) <= * OK [HIGHESTMODSEQ 11396890]
(tls-decrypted) <= A2 OK [READ-WRITE] Enron selected. (Success)
Command (SELECT) execution time: 4.457428 sec

I don't even have an idea how long it would have taken for nmh to do a
readdir() on a directory with that many files.  Strangely, the speed of
adding messages to that mailbox seemed to not depend on the number of
messages in the mailbox, but it varied depending on the time of day.

Performing a scan equvalent on that many messages kind of bogs down also:

% imaptest +Enron 'FETCH 1:5000 (FLAGS RFC822.SIZE BODY.PEEK[HEADER.FIELDS 
(FROM TO SUBJECT DATE)] BODY.PEEK[TEXT]<0.80>)' -timestamp 
Connect time: 0.013550 sec
TLS negotation time: 0.025930 sec
Command (CAPABILITY) execution time: 0.012074 sec
Command (AUTHENTICATE) execution time: 0.034879 sec
Authentication time: 0.104827 sec
Command (SELECT) execution time: 4.475118 sec
Command (FETCH) execution time: 44.801250 sec
Total command execution time: 49.276434 sec
Command (LOGOUT) execution time: 0.015691 sec
Total elapsed time: 49.410705 sec

Compared to the performance of the Cyrus-SASL archives, that's kind of
disappointing.  But the mailbox is 40x bigger, so maybe that's the issue.

Okay, so this is a lot better:

CREATE Enron2
COPY 1:10426 Enron2 (approximately 43 seconds)

(tls-encrypted) => A2 SELECT "Enron2"
(tls-decrypted) <= * FLAGS (\Answered \Flagged \Draft \Deleted \Seen 
$NotPhishing $Phishing fart)
(tls-decrypted) <= * OK [PERMANENTFLAGS (\Answered \Flagged \Draft \Deleted 
\Seen $NotPhishing $Phishing fart \*)] Flags permitted.
(tls-decrypted) <= * OK [UIDVALIDITY 3518] UIDs valid.
(tls-decrypted) <= * 10426 EXISTS
(tls-decrypted) <= * 0 RECENT
(tls-decrypted) <= * OK [UIDNEXT 10427] Predicted next UID.
(tls-decrypted) <= * OK [HIGHESTMODSEQ 11407756]
(tls-decrypted) <= A2 OK [READ-WRITE] Enron2 selected. (Success)
Command (SELECT) execution time: 0.121310 sec

But, still not great:

% imaptest +Enron2 'FETCH 1:* (FLAGS RFC822.SIZE BODY.PEEK[HEADER.FIELDS (FROM 
TO SUBJECT DATE)] BODY.PEEK[TEXT]<0.80>)' -timestamp
Connect time: 0.012951 sec
TLS negotation time: 0.023604 sec
Command (CAPABILITY) execution time: 0.010854 sec
Command (AUTHENTICATE) execution time: 0.028128 sec
Authentication time: 0.091857 sec
Command (SELECT) execution time: 0.108625 sec
Command (FETCH) execution time: 88.653104 sec
Total command execution time: 88.761790 sec
Command (LOGOUT) execution time: 0.014169 sec
Total elapsed time: 88.880982 sec

Gimap is really in the crapper here.  Creating a new folder took me 3 seconds,
so I wonder if there is some global index that needs traversing.

Some operations don't scale linearly.  If we use user flags as sequences
(Gimap supports arbitrary flags), we get:

+Enron 'STORE 1:10000 +FLAGS.SILENT (fart)' -snoop -timestamp
(tls-encrypted) => A3 STORE 1:10000 +FLAGS.SILENT (fart)
(tls-decrypted) <= A3 OK Success
Command (STORE) execution time: 2.818275 sec

You would think that 1:100000 would take 28-30 seconds, right?  But no.
It exceeds the timeout limit (60 seconds by default).  If I don't use
.SILENT, 1:100000 takes 574 seconds.  But, interstingly enough if I
run it again on the first 100000 messages, it takes 29 seconds; maybe
that's due to not having to change the flags?  More research needed.

Hm, I get this on the whole folder:

(tls-encrypted) => A3 STORE 1:* -FLAGS.SILENT (fart)
(tls-decrypted) <= A3 OK Success
Command (STORE) execution time: 874.630423 sec

Linear scaling would sugest that it really should be closer to 130
seconds.  Ahhh ... I think the key there is the database update.  A second
run is closer to where it should be:

(tls-encrypted) => A3 STORE 1:* -FLAGS.SILENT (fart)
(tls-decrypted) <= A3 OK Success
Command (STORE) execution time: 147.341493 sec

So if we do the first command again, making sure that flag is cleared,
we get:

(tls-encrypted) => A3 STORE 1:10000 +FLAGS.SILENT (fart)
(tls-decrypted) <= A3 OK Success
Command (STORE) execution time: 77.693338 sec

- But ... where things are a win is here (on the original "Enron" folder)

(tls-encrypted) => A3 SEARCH TEXT "corruption"
(tls-decrypted) <= * SEARCH [... whole lot of entries ...]
(tls-decrypted) <= A3 OK SEARCH completed (Success)
Command (SEARCH) execution time: 0.136124 sec

I doubt we could ever achieve that kind of performance on that many
messages, and I guess this makes it clear where Google is putting
their energy.

I'm not sure where mailbox size becomes a problem.  I was planning
on uploading the corpus using the original folder structure and checking
out how easy that is to manage.  One thing I did notice is that at
least on Gmail, folder creation time slows down roughly proportional
to how many folders you have; creating 3500 folders takes a bit of time.

So, this particular torture test didn't have as amazing results as I
had hoped.  But how would it compare to the same corpus on a disk?
And what are "typical" operations?  Do people really want to scan(1)
a folder with a half-million messages in it? Or do they really want to
run "pick" on it and only look at a few?

--Ken



reply via email to

[Prev in Thread] Current Thread [Next in Thread]