[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] [lmi-commits] master 7dd2680 14/14: Add and use a forward-summ
From: |
Greg Chicares |
Subject: |
Re: [lmi] [lmi-commits] master 7dd2680 14/14: Add and use a forward-summation function template |
Date: |
Sat, 20 Feb 2021 21:38:00 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 |
On 2/20/21 2:49 PM, Vadim Zeitlin wrote:
> On Thu, 18 Feb 2021 12:03:42 -0500 (EST) Greg Chicares
> <gchicares@sbcglobal.net> wrote:
>
> GC> branch: master
> GC> commit 7dd2680044d48d794d1e68e087d0795ea70b2525
> GC> Author: Gregory W. Chicares <gchicares@sbcglobal.net>
> GC> Commit: Gregory W. Chicares <gchicares@sbcglobal.net>
> GC>
> GC> Add and use a forward-summation function template
> GC>
> GC> Incidentally, this commit will make it simpler to
> GC> s/partial_sum/inclusive_scan/
> GC> once gcc is upgraded from version 8 for lmi production.
>
> I'm curious if using inclusive_scan() rather than partial_sum() can change
> the results in practice. I know that in theory it could, due to the
> floating point addition operation not being associative, but I haven't
> actually tried confirming this experimentally and so I don't really know if
> there could be a noticeable effect.
Suppose I use a std::inclusive_scan overload that doesn't specify
an execution policy. Is that equivalent to specifying the
"sequenced" policy? I've spent some time looking through the
C++17 standard (N4659) for the answer, but cannot find it.
(I should upgrade my copy to C++20 soon, anyway.)
If (as I infer, but cannot substantiate by citing the standard)
they are equivalent, then inclusive_scan with an elided (default)
execution policy should be equivalent to partial_sum.
And in that case, we should s/partial_sum/inclusive_scan/
everywhere: inclusive_scan is preferable because it offers the
option of using parallelism, which we can later investigate, and
then exploit or not.
> Also, as usual, I don't know what is the typical size of the vectors this
> code works with. If it's smallish, it's probably not even worth bothering
> with checking whether inclusive_scan() could be used here or not as it's
> not going to provide any significant gains over partial_sum() anyhow. But
> if the vectors can be really big, it could be interesting to explore using
> a parallel (unsequenced?) policy.
The relevant vectors in lmi are almost always of length [0,100].
The density function is probably
- virtually zero for [0]
- close to zero for [1,15] and [85,99]
- fairly uniform for [16,84]
- highest at [100], but not by much
For the important special case [100], an inclusive scan can be
done in [log2(100)] (i.e., 7) iterations, given enough cores.
Even for such a small length, parallelism might provide a
worthwhile speed improvement.