lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] [lmi-commits] master 7dd2680 14/14: Add and use a forward-summ


From: Greg Chicares
Subject: Re: [lmi] [lmi-commits] master 7dd2680 14/14: Add and use a forward-summation function template
Date: Sat, 20 Feb 2021 21:38:00 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0

On 2/20/21 2:49 PM, Vadim Zeitlin wrote:
> On Thu, 18 Feb 2021 12:03:42 -0500 (EST) Greg Chicares 
> <gchicares@sbcglobal.net> wrote:
> 
> GC> branch: master
> GC> commit 7dd2680044d48d794d1e68e087d0795ea70b2525
> GC> Author: Gregory W. Chicares <gchicares@sbcglobal.net>
> GC> Commit: Gregory W. Chicares <gchicares@sbcglobal.net>
> GC> 
> GC>     Add and use a forward-summation function template
> GC>     
> GC>     Incidentally, this commit will make it simpler to
> GC>       s/partial_sum/inclusive_scan/
> GC>     once gcc is upgraded from version 8 for lmi production.
> 
>  I'm curious if using inclusive_scan() rather than partial_sum() can change
> the results in practice. I know that in theory it could, due to the
> floating point addition operation not being associative, but I haven't
> actually tried confirming this experimentally and so I don't really know if
> there could be a noticeable effect.

Suppose I use a std::inclusive_scan overload that doesn't specify
an execution policy. Is that equivalent to specifying the
"sequenced" policy? I've spent some time looking through the
C++17 standard (N4659) for the answer, but cannot find it.
(I should upgrade my copy to C++20 soon, anyway.)

If (as I infer, but cannot substantiate by citing the standard)
they are equivalent, then inclusive_scan with an elided (default)
execution policy should be equivalent to partial_sum.

And in that case, we should s/partial_sum/inclusive_scan/
everywhere: inclusive_scan is preferable because it offers the
option of using parallelism, which we can later investigate, and
then exploit or not.

>  Also, as usual, I don't know what is the typical size of the vectors this
> code works with. If it's smallish, it's probably not even worth bothering
> with checking whether inclusive_scan() could be used here or not as it's
> not going to provide any significant gains over partial_sum() anyhow. But
> if the vectors can be really big, it could be interesting to explore using
> a parallel (unsequenced?) policy.

The relevant vectors in lmi are almost always of length [0,100].
The density function is probably
 - virtually zero for [0]
 - close to zero for [1,15] and [85,99]
 - fairly uniform for [16,84]
 - highest at [100], but not by much
For the important special case [100], an inclusive scan can be
done in [log2(100)] (i.e., 7) iterations, given enough cores.
Even for such a small length, parallelism might provide a
worthwhile speed improvement.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]