[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Nmh-workers] nmh Speed Measured by ltrace(1).

From: Ralph Corderoy
Subject: [Nmh-workers] nmh Speed Measured by ltrace(1).
Date: Sat, 29 Apr 2017 12:29:46 +0100


I've only recently upgraded to 1.6 in the last year or so from a circa
2010 version, and nmh has felt sluggish on this Intel Atom D525.  :-)
ltrace(1)'s -c option to count library calls is a simple summary of
what's happening, so I decided to compare that across a few versions.

Here's the headline for `LC_ALL=C scan -width 80 +foo last' where "foo+
has 4275 messages  (1-5736); cur=5705".

    scan.`git describe`                    seconds  calls

    scan.1.5-branchpoint-636-g3dc0fe0b    1.788846   7658
    scan.1.6                              9.312562  42687
    scan.1.6-3-arch-linux                10.364064  47345
    scan.1.6-branchpoint-1294-gfa377c31  10.516923  47248
    scan.1.6-branchpoint-1295-geee0d4e1   9.145452  41412
    scan.1.6-branchpoint-1296-g2a450699   7.877227  35573
    scan.1.6-branchpoint-1301-g67733cd0   5.377305  23899
    scan.1.6-branchpoint-1305-gc687e5bd   4.828038  21269
    scan.1.6-branchpoint-1309-g159b2a1e   2.192595   9597

ltrace is obviously adding wall-clock time overhead;  I don't normally
wait ten seconds for a one-line scan, but it's a linear slow-down AFAIK,
so the relative difference is useful.  The filenames include
git-describe(1)'s output.

So the good news is, we're getting close to that old commit, the bad
news is there's further to go.  Also, scanning the whole folder takes a
lot more time, user, system, and wall clock, between these two commits.

    \time -v scan +foo >/dev/null           -636-    -1309-

    User time (seconds)                     0.37     0.61
    System time (seconds)                   0.13     0.22
    Percent of CPU this job got             98%      99%
    Elapsed (wall clock) time               0:00.52  0:00.84
    Average shared text size (kbytes)       0        0
    Average unshared data size (kbytes)     0        0
    Average stack size (kbytes)             0        0
    Average total size (kbytes)             0        0
    Maximum resident set size (kbytes)      3212     40536
    Average resident set size (kbytes)      0        0
    Major (requiring I/O) page faults       0        0
    Minor (reclaiming a frame) page faults  225      9616
    Voluntary context switches              1        1
    Involuntary context switches            17       2
    Swaps                                   0        0
    File system inputs                      0        0
    File system outputs                     0        0
    Socket messages sent                    0        0
    Socket messages received                0        0
    Signals delivered                       0        0
    Page size (bytes)                       4096     4096
    Exit status                             0        0

I think it's the difference in RSS, and the corresponding minor page
faults, that's the next focus.  1.6-branchpoint-134-g92128dac changed
some memory allocation in scan() and I think new memory is being
allocated each time, with the old never freed;  valgrind agrees.

    Command: ./scan.1.5-branchpoint-636-g3dc0fe0b +foo
        in use at exit: 43,571 bytes in 205 blocks
      total heap usage: 133,353 allocs, 133,148 frees, 59,200,717 bytes 

       definitely lost: 141 bytes in 4 blocks
       indirectly lost: 16,386 bytes in 1 blocks
         possibly lost: 80 bytes in 1 blocks
       still reachable: 26,964 bytes in 199 blocks
            suppressed: 0 bytes in 0 blocks

    Command: ./scan.1.6-branchpoint-1309-g159b2a1e +foo
        in use at exit: 35,220,481 bytes in 166,333 blocks
      total heap usage: 290,841 allocs, 124,508 frees, 136,760,544 bytes 

       definitely lost: 12,443,133 bytes in 12,823 blocks
       indirectly lost: 22,756,260 bytes in 153,269 blocks
         possibly lost: 5,034 bytes in 43 blocks
       still reachable: 16,054 bytes in 198 blocks
            suppressed: 0 bytes in 0 blocks

This may be as simple as moving uip/scan.c's main()'s scanl to a higher
scope within the function, but I haven't had time to check yet, and
should really be off doing something to earn a crust.

The main improvements I've done are related to how an individual
message's sequences are stored, and can be seen in the last few days of
But I wonder if MH has always had this the wrong way around.  Is it
common go from message to sequences?  A lot of the time the code is
wanting sequence to message and ends up checking every message to see if
it's a member.  And with 10,000 messages, and 100 sequences, it may be
worth swapping.

Users that run git HEAD, please update so you're pounding on this new
code.  It passes the test suite, including with valgrind, but there's
nowt as strange as folk.  :-)

Cheers, Ralph.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]