Re: [lmi] Can linux-perf illuminate this problem?

lmi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Can linux-perf illuminate this problem?

From:	Greg Chicares
Subject:	Re: [lmi] Can linux-perf illuminate this problem?
Date:	Fri, 5 Mar 2021 22:38:38 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0

On 3/5/21 9:46 PM, Vadim Zeitlin wrote:
> On Fri, 5 Mar 2021 20:30:53 +0000 Greg Chicares <gchicares@sbcglobal.net> 
> wrote:
[...]
> GC> This command:
> [...]
> GC> ...but that only seems to tell me how much time is spent in each
> GC> function called by AccountValue::DoMonthDR(). However, I suspect
> GC> that I've done something atrocious inside DoMonthDR() itself.
> GC> Can 'perf' help me find that?
> 
>  If you select the function and press "Enter", you should see the menu with
> several choices, the first of which is "Annotate <function>". Selecting it
> shows the instructions actually being executed, annotated with the source
> lines, and percentage of the execution time for the hot instructions.

Thanks, very helpful. In order to be able to "Annotate" this function,
I seem to need to add '--call-graph=none' (discovered by randomly
permuting flags). Then, ultimately I get here:

       │     round_to<double>::c(double) const:
  0.32 │       fmull  -0x1e0(%rbp)
 31.01 │       fstpl  -0x1e0(%rbp)
  2.00 │       movsd  -0x1e0(%rbp),%xmm0
  0.43 │     → callq  *0xd50(%r13)
       │             * scale_back_cents_
  0.30 │       fldt   0xd40(%r13)

so that FSTPL is apparently the problem.

FSTPL? with x86_64? Yes, since 'round_to.hpp' uses type
'long double'. Changing that to 'double' is likely to be a
Really Big Change, which I'm not going to attempt soon.

But the real question is why that FSTPL costs so much. While
I can't yet prove it, the reason seems clear: the argument to
round_to<>::c() is very often an extreme value. In HEAD, it's
  std::numeric_limits<double>::max()
and the code seems equally slow if I change that to
  std::numeric_limits<double>::infinity()
(so that's no silver bullet).

Originally we had
  double limit = SOME_BIGNUM;
  double payment = some_everyday_value;
  double limited_payment = std::min(limit, payment);
and the x87 handled that well. It continued to work well
when I semi-currency-ized it:

  double limit = SOME_BIGNUM;
  currency payment = some_everyday_value;
  currency limited_payment = round_gross_pmt.c(
    std::min(limit, dblize(payment)
    );

But when I fully currency-ized it:

  currency limit = round_gross_pmt.c(SOME_BIGNUM);

that's the single statement that slowed the whole program
down painfully, because it multiplies SOME_BIGNUM by 100.0
(probably causing overflow) and then FSTPL's the result.

I don't yet know the best way to deal with this, but
I do now know exactly what to investigate.

[Prev in Thread]

Current Thread

[Next in Thread]

[lmi] Can linux-perf illuminate this problem?, Greg Chicares, 2021/03/05
- Re: [lmi] Can linux-perf illuminate this problem?, Vadim Zeitlin, 2021/03/05
  - Re: [lmi] Can linux-perf illuminate this problem?, Greg Chicares <=
    - Re: [lmi] Can linux-perf illuminate this problem?, Vadim Zeitlin, 2021/03/05

Prev by Date: Re: [lmi] Can linux-perf illuminate this problem?
Next by Date: Re: [lmi] Can linux-perf illuminate this problem?
Previous by thread: Re: [lmi] Can linux-perf illuminate this problem?
Next by thread: Re: [lmi] Can linux-perf illuminate this problem?
Index(es):
- Date
- Thread