lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Would manual optimization help here?


From: Vadim Zeitlin
Subject: Re: [lmi] Would manual optimization help here?
Date: Sat, 27 Mar 2021 00:01:09 +0100

On Fri, 26 Mar 2021 14:59:46 +0000 Greg Chicares <gchicares@sbcglobal.net> 
wrote:

GC> Vadim--In this century, should we never even contemplate manual
GC> optimizations?

 I always hesitate to give definitive answers to general questions like
this, but if I had to, I would say that you shouldn't without first seeing
profile results indicating that this could be useful.


GC> In particular, here, in a loop that iterates once for every month in
GC> the year, with some values that depend on the year but not the month:
GC> 
GC> AccountValue::TxSetCoiCharge() ...
GC>     NAAR = round_naar().c
GC>         ( DBReflectingCorr * DBDiscountRate[Year]
GC>         - dblize(std::max(C0, TotalAccountValue()))
GC>         );
GC> 
GC> I'm wondering whether I should hoist DBDiscountRate[Year], to
GC> perform the subscripting once instead of twelve times.

 My first problem was that I didn't see any loop here, but I've finally
found it in AccountValue::RunOneCell(), i.e. 4 stack frames above this
function. Knowing this and that the intermediate functions are almost
certainly not inlined, it seems highly improbable that the compiler would
be able to somehow hoist the array lookup outside the loop (at least
without using LTO), so I'm pretty sure it must be performing the vector
indexing operation in each loop iteration and, in 11 out of 12, doing this
unnecessarily.

 OTOH this operation is extremely cheap, while introducing another variable
might have its own penalty, so it's still not obvious at all whether it's
worth doing.

GC> Elsewhere, lmi actually does that--e.g.,
GC> 
GC> 'account_value.hpp': [cached scalar value]
GC>     double       YearsDcvIntRate;
GC> 
GC> 'ihs_acctval.cpp': [performed only once each year]
GC>     YearsDcvIntRate         = i7702_->ic_usual()[Year];
GC> 
GC> 'ihs_avmly.cpp': [performed each month, in each year]
GC>         Dcv += round_interest_credit().c(YearsDcvIntRate * Dcv);
GC> 
GC> and my question is whether I should do something similar, adding
GC> a new 'double YearsDBDiscountRate', setting it once per year
GC> (say, 100 times for 100 years), and then using it 1200 times.

 This looks like a simple change to do, so I'm not sure why not just test
if doing it affects the results of the "selftest"? Answering the desired
question directly seems more epistemologically correct to me than trying to
deduce it indirectly.

GC> Should gdb help me answer my question above?

 I don't think it's the right tool for the job here. Determining
performance from just looking at assembly code is pretty difficult. And if
you still wanted to look at the assembly, it's better to ask the compiler
to generate a somewhat more readable version of it, rather than using gdb
to view it.

 Normally the right tool would be perf, i.e. I'd diff its output before and
after making the change. But in this particular case just using selftest
seems like an even simpler, and hence better, solution.


GC> Here's a gdb session; annotations have '#' in the left margin.
[...]
GC> (gdb) disas /s
GC> 
GC> # ...omit what looks like a prologue
GC> 
GC> 1774        NAAR = round_naar().c
GC> --Type <RET> for more, q to quit, c to continue without paging--
GC> 
GC> # [hit 'c']
GC> 
GC> 1775            ( DBReflectingCorr * DBDiscountRate[Year]
GC>    0x00007ffff7c58010 <+16>:    movslq 0x18c0(%rdi),%rdx
GC>    0x00007ffff7c58017 <+23>:    mov    0x948(%rdi),%rax

 I'd just like to note that the instructions shown here don't necessarily
match the source line shown above. There is simply no one-to-one mapping
between the source statements and the generated machine code when using
optimizations, so sometimes you can find the extra instructions which get
attached to the closest source code line and sometimes the instructions
that do correspond to a source code line get moved somewhere else. And then
there is inlining, of course, which means that you can also get
instructions corresponding to a completely different function at the source
code level.

GC> # ...disassembly of classes currency and round_to, and
GC> # of some libstdc++ stuff...
GC> 
GC> /opt/lmi/src/lmi/ihs_avmly.cpp:
GC> 1777            );
GC>    0x00007ffff7c5806f <+111>:   movsd  %xmm1,-0x18(%rbp)
GC> 
GC> # That doesn't seem to tell me much. It looks like all the real
GC> # work is done in instructions listed under called functions.
GC> # The disassembly listed under calls into the currency and rounding
GC> # functions seems unremarkable, but then I see this, which suggests
GC> # to my unpracticed eye that a lot of real work is being done here,
GC> # and annotating it as "stl_vector.h" doesn't mean that this is
GC> # actually all part of std::vector::operator[]:
GC> 
GC> /usr/include/c++/10/bits/stl_vector.h:
GC> 1061          operator[](size_type __n) const _GLIBCXX_NOEXCEPT
GC>    0x00007ffff7c58176 <+374>:   movslq 0x18c0(%rbx),%rdx
GC>    0x00007ffff7c5817d <+381>:   movsd  0x1900(%rbx),%xmm0

 I still have trouble reading AT&T asm syntax (my own .gdbinit contains
"set disassembly-flavor intel"), but this doesn't look like the code for
DBDiscountRate[Year] to me because nothing is being indexed by "Year". Yet,
it must be somewhere here, of course.

GC>    0x00007ffff7c58185 <+389>:   divsd  0xb6083(%rip),%xmm0        # 
0x7ffff7d0e210
GC>    0x00007ffff7c5818d <+397>:   mov    (%rax),%rax
GC>    0x00007ffff7c58190 <+400>:   fldt   0xc30(%rbx)
GC> 
GC> # Already, something curious is going on. This is a vector of
GC> # double, for which x86_64 pc-linux-gnu wouldn't use FLDT, AIUI,
GC> # because that's an x87 load-tenbyte instruction. No 'long double'
GC> # variable is used except in lmi's round_to class, so that class's
GC> # code has to be what's shown here, AFAICT. It continues (probably
GC> # there's nothing interesting here, but I don't dare snip too much):

 I did snip it because I am just not sure what exactly are we supposed to
conclude from studying all this disassembly.

GC> So...would it be worth hoisting the vector indexing from the
GC> monthly to the yearly level? I can't tell.

 Me neither. I'm tempted to say that it's not going to change anything, but
the only real way of knowing it would be to test.

GC> Is there a better gdb technique, or a better tool, to help
GC> me find the answer to my question? Is gdb telling me, in an
GC> inscrutable way, that the question is silly and gcc will
GC> already optimize better than I could do manually?

 I'm reasonably sure gcc can't optimize away this vector lookup without LTO
(and I'd be impressed if it could do it even with LTO). But I can't answer
the question of whether it's worth avoiding it. I.e. if you really wanted
to have an answer, I'd just benchmark it myself.

 One general thing I can say is that using "Year" as index into several
(many?) different vectors looks very cache-unfriendly. I suspect (but,
again, without being sure at all without testing) that it could be highly
advantageous to have a struct containing all the values for the current
year, which should fit in the cache, that would be filled at the start of
every iteration of the outer loop in RunOneCell() and then use it in the
rest of code. I.e. having individual "caches" for some vector elements
doesn't seem very useful because there are still lookups of the other
vectors, but if we could avoid using "Year" as index anywhere at all, it
should be good for performance.

 I don't know much about optimizing for modern CPUs, but I do know that
data locality and cache-friendliness are much more important than the code
itself (although code cache is also important, i.e. too much code may be
bad even if larger code is faster on paper than the shorter version). So
using vector of structs is, generally speaking, much preferable to using a
struct of vectors, which is what AccountValue is currently. I understand
that changing this is probably not going to be easy, or maybe even
possible, at all, and I don't know this code (nor this data) well enough to
be able to answer this question myself, but I'd at least consider it.

 Sorry for the lack of answer to your question, but I really don't think
it's productive to be trying to do it using gdb.

VZ

Attachment: pgph3MBell1Df.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]