[lmi] Remarkable performance problem

lmi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] Remarkable performance problem

From:	Greg Chicares
Subject:	[lmi] Remarkable performance problem
Date:	Wed, 28 Feb 2018 01:54:15 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

Vadim--Do you have any idea what I might try next?

Today Kim sent me a census file that runs quickly enough for her, but
shockingly slowly for me. (We can't share it because it contains
personal information.) It contains about 1600 cells. We each do
  Census | Run case
and, on the statusbar, observe calculation times of:
   177.967 Kim  (msw-7)
  2073.000 Greg (wine)

The sluggishness was immediately apparent to me: it felt as though I
were running on an old i486. As soon as I loaded this census, every
GUI operation became slow: resizing a child window, editing a cell's
contents, running a single cell--anything at all.

Watching the progress dialog, I saw that some tens of cells would
take about a second each (already an order of magnitude slower than
I'd expect), but then it would stall for a while and take ten or more
seconds to run a single cell. There was no obvious pattern suggesting
a memory-reallocation stall (e.g., pausing after 128, 256, and 512
cells).

I tried editing the case-default cell, hoping to simplify the census,
but lmi became unresponsive when I asked to apply my global changes
to all cells. I took a break for lunch; it was still frozen when I
got back, so I closed it down.

Then I entered a new case. It ran as fast as ever. Reloading the
offending census, it was sluggish again. gnome-system-monitor said
it was using 400 MiB of storage; it reached that total upon loading
the census, and never went beyond.

I closed another 'wine' application that had been running in the
background, and that seemed to make the GUI slightly less sluggish
(or maybe it was wishful thinking), but that didn't make this census
run any faster. I also ran 'wine-reboot' with progressively more
severe options, from '--restart' through '--kill'; that didn't help.

This felt like running an application accidentally built with an
invasive debugging facility like gcc's safe-standard-library macros;
to rule that out, I did 'make clobber', rebuilt, and...no joy.

I tried running the same census with an archived release from last
Halloween, and it was just about as slow as today's HEAD. Kim ran
it with last month's release, and saw about a five percent speed
difference. That rules out any recent change: it's something about
this census, or wine. (Kim's using a ridiculously underpowered laptop
with msw-7.)

It seems that not everything lmi is doing is equally affected. The
statusbar shows interim results as it finds roots of polynomials
(i.e., as it performs "solves"), and that's speedy as always: if I
watch the
  iteration N iterand X ...
text very carefully, I can see N incrementing its count in a blur.

The census-manager screen looks awfully full--lots of rows and lots
of columns--so I manually edited numerous fields in this 20 MB census
to remove many of the differences; no effect. Running 'file', I saw
that there were CRLF line endings, so I removed all CRs; no effect.
I saw a literal '&amp;' in the corporation's name ('od' confirms that
it's escaped in the actual file), so I removed that; no effect.

Because using the GUI felt like swimming through a tar pit, I thought
I might have found a problem with wx or wine, so I tried the CLI:

wine ./lmi_cli_shared --ash_nazg --data_path=/opt/lmi/data -a 
--file=../input/xyzzy-lf-10.cns --emit=emit_composite_only,emit_timings

and felt encouraged when the first few dots showed up quickly; but my
heart sank when I saw intermittent stalls. For the tabulation below,
I used my cleaned-up census (LF-terminated, no '&amp;', etc.), and
reduced the number of cells using vim (more suitable for this than
lmi's GUI). The GUI timing for 1000 cells is even slower than what I
reported above for the original 1600-cell census, so it seems clear
that none of my intermediate steps (removing '&amp;', rebooting wine)
has any positive effect.

Cutting the size of the census is the only thing that seems to help.
Time (in seconds) to run a census, by number of census cells, for
GUI vs. CLI; then both timings divided by number of cells:

  # cells     GUI      CLI    ratio-to-#
    1000  2618.712  496.754   2.62  0.50
     500   253.853   53.631   0.51  0.11
     100    10.018    5.579   0.10  0.06
      10      .532     .379   0.05  0.04
       1      .114     .036   0.11  0.04

Reading up from the bottom, I interpret that as:
  linear, linear, linear, not linear, horrendous

At the end of the 1000-cell GUI run, gnome-system-monitor said that
lmi was using 289 MiB. That may sound like a lot, but within recent
weeks I successfully pasted a 10000-cell census, causing lmi to use
several times as much RAM--more even than 'thunderbird'--so that's not
a problem per se.

The problem isn't that 1600 cells is even a lot. I just loaded a case
with 10477 cells and started "Census | Run all", and it had already
chugged merrily through a thousand cells without stalling when I
cancelled it on the progress dialog. It's using 1.1 GiB of memory, the
same amount it was using while calculating. Because of its size, it
takes a few seconds to do "Census | Varying column width"; but with
the "spooky" census, that same operation took several minutes (I
didn't have the patience to wait for it to finish).

Let's take that 10477-cell census, cut it down to 1000 cells, and time
it as above. (BTW, this is another actual production case, and it uses
the same mce_ill_reg ledger type as the troublesome census).

  # cells     GUI      CLI    ratio-to-#
    1000    320.047  173.656  0.32  0.17
     500    119.766   85.034  0.24  0.17
     100     20.389   16.364  0.20  0.16
      10      2.112    1.652  0.21  0.17
       1       .286     .172  0.29  0.17

Here, time is approximately linear in number of cells. I'm not sure
why the GUI is always slower, as both GUI and CLI simply use class
ledger_emitter for all this work; perhaps the difference represents
the cost of updating the statusbar and progress dialog. And I'd
expect this to be linear, because the code just processes all cells
seriatim, accumulating totals as it goes.

So I can't explain the very non-linear pattern seen above with the
troublous census. I don't see any quirk in the PlusEq() functions
(which perform the composite accumulation) that could explain that
phenomenon.

It's not the particular (proprietary) product. If I switch the census
last used (10477 cells, cut down to 1000) to the product that the
problem census uses, then 1000 cells take only 107.549 seconds in
the GUI, compared to 320.047 above--so that product is simpler.

This really feels like resource exhaustion to me. The "easy" census
can be loaded and manipulated in the GUI readily enough, even with
all its 10477 original cells; the much smaller problem census brings
the GUI to a crawl--but gnome-system-monitor says it uses less memory,
and its '.cns' file is proportionately smaller. The only striking
difference is that the troublesome census shows many more columns in
the census manager, because its cells are more different from each
other; but that shouldn't make them larger (a census isn't stored in
RAM as deltas vs. the case-default cell, even though that's the way
the census manager depicts it), and it shouldn't make it take longer
to process each cell.

Bad file format? No, I compared the "easy" and "hard" censuses side by
side, and they have the same xml elements in the same order.

And why is this census okay for Kim but not for me?

One last test for the evening: I open the original problem census and
start "Census | Run case". A minute and a half later, the progress
dialog says it has finished thirty cells; I cancel it and close lmi.
Now I edit a copy of that census in vim, delete all cells after the
thirtieth, and save. Running that case in exactly the same way, all
thirty cells are finished in 1.655 seconds: a second and a half, vs.
a minute and a half, to perform exactly the same work. How can that
be possible?

[Prev in Thread]

Current Thread

[Next in Thread]

[lmi] Remarkable performance problem, Greg Chicares <=
- Re: [lmi] Remarkable performance problem, Vadim Zeitlin, 2018/02/28
  - Re: [lmi] Remarkable performance problem, Greg Chicares, 2018/02/28
    - Re: [lmi] Remarkable performance problem, Vadim Zeitlin, 2018/02/28
    - Re: [lmi] Remarkable performance problem, Greg Chicares, 2018/02/28
  - Re: [lmi] Remarkable performance problem, Greg Chicares, 2018/02/28
- Re: [lmi] Remarkable performance problem, Greg Chicares, 2018/02/28

Prev by Date: Re: [lmi] numeric_io_traits problem under Valgrind
Next by Date: Re: [lmi] Remarkable performance problem
Previous by thread: [lmi] Please review Makefile.am changes
Next by thread: Re: [lmi] Remarkable performance problem
Index(es):
- Date
- Thread