lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] [lmi-commits] master 7dd2680 14/14: Add and use a forward-summ


From: Greg Chicares
Subject: Re: [lmi] [lmi-commits] master 7dd2680 14/14: Add and use a forward-summation function template
Date: Sun, 21 Feb 2021 01:15:37 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0

On 2/20/21 11:00 PM, Vadim Zeitlin wrote:
> On Sat, 20 Feb 2021 21:38:00 +0000 Greg Chicares <gchicares@sbcglobal.net> 
> wrote:
> 
> GC> On 2/20/21 2:49 PM, Vadim Zeitlin wrote:
> GC> > On Thu, 18 Feb 2021 12:03:42 -0500 (EST) Greg Chicares 
> <gchicares@sbcglobal.net> wrote:
> GC> > 
> GC> > GC> branch: master
> GC> > GC> commit 7dd2680044d48d794d1e68e087d0795ea70b2525
[...]
> GC> > GC>     Incidentally, this commit will make it simpler to
> GC> > GC>       s/partial_sum/inclusive_scan/
> GC> > GC>     once gcc is upgraded from version 8 for lmi production.
> GC> > 
> GC> >  I'm curious if using inclusive_scan() rather than partial_sum() can 
> change
> GC> > the results in practice. I know that in theory it could, due to the
> GC> > floating point addition operation not being associative, but I haven't
> GC> > actually tried confirming this experimentally and so I don't really 
> know if
> GC> > there could be a noticeable effect.
> GC> 
> GC> Suppose I use a std::inclusive_scan overload that doesn't specify
> GC> an execution policy. Is that equivalent to specifying the
> GC> "sequenced" policy?
> 
>  To take the question literally, the answer is "definitely no": using any
> parallel algorithm is definitely different from using the corresponding
> non-parallel algorithm without any policy, if only because of a drastically
> different approach to exceptions: throwing them during parallel algorithm
> execution results in calling std::terminate(), which is, of course, not the
> case for the non-parallel algorithms.

Oh. I saw something about exceptions, but didn't realize the
consequences were that dire.

The appropriate practice for lmi is therefore never to use these
parallel algorithms, at least not until C++2047 when they don't
call std::terminate.

>  But answering the question you probably wanted to ask, i.e. which
> execution policy is used by the non-parallel algorithm, is more difficult
> and I couldn't find the definitive answer to it neither.

Thanks for taking a look. I suppose that's a defect that they
might notice and remediate sometime within the next few years,
even if only to specify that it's unspecified.

> GC> If (as I infer, but cannot substantiate by citing the standard)
> GC> they are equivalent, then inclusive_scan with an elided (default)
> GC> execution policy should be equivalent to partial_sum.
> 
>  Just to be clear, the theoretical problem I was referring to was the
> non-associativity of floating point addition. I.e. changing the order of
> operations may change the result, although I still don't know if it can
> change it significantly.

Significance is a matter of judgement. For lmi, it might come down
to a question of whether the quotient of certain "commutation function"
polynomials changes. Probably that quotient would change in at least
one plausible case, and probably we'd care enough about that to avoid
the parallel algorithms in the C++ standard.

It's not that there exists One True Order for partial summations, from
which any deviation would constitute a defect. There's probably some
algorithm that performs the summation in some way that maximizes
accuracy, but no matter--what we want is stable reproducibility.
Getting different output for the same input, on two machines that are
identical except for number of cores, would be a Bad Thing for lmi.

> GC> And in that case, we should s/partial_sum/inclusive_scan/
> GC> everywhere: inclusive_scan is preferable because it offers the
> GC> option of using parallelism, which we can later investigate, and
> GC> then exploit or not.

No, exactly the opposite: partial_sum is always preferable for lmi
because it avoids the disadvantages of parallelism (irreproducibility,
uncatchable exceptions); and, as explained below, it's not worth
investigating this further because the benefits for lmi would be
slight at best.

> GC> The relevant vectors in lmi are almost always of length [0,100].
> [...]
> GC> Even for such a small length, parallelism might provide a
> GC> worthwhile speed improvement.
> 
>  Thanks for answering my question about the vectors size. Now I'm certain
> that using parallel policies is never worthwhile for the vectors of such
> size. There might be a tiny benefit of using auto-vectorization for them,
> but they're far too small for it to count. IOW I don't think it's
> interesting to explore using inclusive_scan() here or worry about its
> execution policies, this is never going to change anything in practice here
> anyhow speed-wise (but it might still conceivably change the result...).

Thanks, I'm convinced.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]