[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Can linux-perf illuminate this problem?
From: |
Greg Chicares |
Subject: |
Re: [lmi] Can linux-perf illuminate this problem? |
Date: |
Fri, 5 Mar 2021 22:38:38 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 |
On 3/5/21 9:46 PM, Vadim Zeitlin wrote:
> On Fri, 5 Mar 2021 20:30:53 +0000 Greg Chicares <gchicares@sbcglobal.net>
> wrote:
[...]
> GC> This command:
> [...]
> GC> ...but that only seems to tell me how much time is spent in each
> GC> function called by AccountValue::DoMonthDR(). However, I suspect
> GC> that I've done something atrocious inside DoMonthDR() itself.
> GC> Can 'perf' help me find that?
>
> If you select the function and press "Enter", you should see the menu with
> several choices, the first of which is "Annotate <function>". Selecting it
> shows the instructions actually being executed, annotated with the source
> lines, and percentage of the execution time for the hot instructions.
Thanks, very helpful. In order to be able to "Annotate" this function,
I seem to need to add '--call-graph=none' (discovered by randomly
permuting flags). Then, ultimately I get here:
│ round_to<double>::c(double) const:
0.32 │ fmull -0x1e0(%rbp)
31.01 │ fstpl -0x1e0(%rbp)
2.00 │ movsd -0x1e0(%rbp),%xmm0
0.43 │ → callq *0xd50(%r13)
│ * scale_back_cents_
0.30 │ fldt 0xd40(%r13)
so that FSTPL is apparently the problem.
FSTPL? with x86_64? Yes, since 'round_to.hpp' uses type
'long double'. Changing that to 'double' is likely to be a
Really Big Change, which I'm not going to attempt soon.
But the real question is why that FSTPL costs so much. While
I can't yet prove it, the reason seems clear: the argument to
round_to<>::c() is very often an extreme value. In HEAD, it's
std::numeric_limits<double>::max()
and the code seems equally slow if I change that to
std::numeric_limits<double>::infinity()
(so that's no silver bullet).
Originally we had
double limit = SOME_BIGNUM;
double payment = some_everyday_value;
double limited_payment = std::min(limit, payment);
and the x87 handled that well. It continued to work well
when I semi-currency-ized it:
double limit = SOME_BIGNUM;
currency payment = some_everyday_value;
currency limited_payment = round_gross_pmt.c(
std::min(limit, dblize(payment)
);
But when I fully currency-ized it:
currency limit = round_gross_pmt.c(SOME_BIGNUM);
that's the single statement that slowed the whole program
down painfully, because it multiplies SOME_BIGNUM by 100.0
(probably causing overflow) and then FSTPL's the result.
I don't yet know the best way to deal with this, but
I do now know exactly what to investigate.