Re: [lmi] overview of C++ expression template libraries

lmi
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] overview of C++ expression template libraries

From:	Greg Chicares
Subject:	Re: [lmi] overview of C++ expression template libraries
Date:	Mon, 22 Mar 2021 02:58:52 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0
On 3/21/21 12:55 AM, Vadim Zeitlin wrote:
> On Thu, 18 Feb 2021 01:27:25 +0000 Greg Chicares <gchicares@sbcglobal.net> 
> wrote:
> 
> GC> On 2/17/21 11:36 AM, Vadim Zeitlin wrote:

[...time travelling...]

> GC> > On Thu, 11 Sep 2008 20:16:08 +0000 Greg Chicares 
> <gchicares@sbcglobal.net> wrote:
> GC> > 
> GC> > GC> On 2007-01-08 14:23Z, Vadim Zeitlin wrote:
> GC> > GC> > 
> GC> > GC> >  I don't see anything against PETE except that it doesn't seem 
> clear
> GC> > GC> > whether it's actively maintained.
> GC> > GC> 
> GC> > GC> There's no recent maintenance activity AFAICT.
> GC> > 
> GC> >  This is still the case today and, I think, it's safe to say there will
> GC> > never be any. Yet, lmi still uses PETE and you've been extending its use
> GC> > recently -- which is the motivation for this post, of course, as I 
> wonder
> GC> > if it could make sense to have another look at the ET libraries using
> GC> > modern C++ in case we can find something "better" than PETE.
> GC> 
> GC> Maybe, though I doubt it.
> 
>  Seeing all the recent work you've been doing on PETE, I guess this is even
> more doubtful now than it was a month ago, isn't it?

Not necessarily. For changes like this (commit 3143a303a6d4)...

-    std::transform(miscellaneous_charges.begin(), miscellaneous_charges.end(), 
AmortLoad_         .begin(), miscellaneous_charges.begin(), 
std::plus<double>());
-    std::transform(miscellaneous_charges.begin(), miscellaneous_charges.end(), 
ExtraSepAcctCharge_.begin(), miscellaneous_charges.begin(), 
std::plus<double>());

+    miscellaneous_charges += AmortLoad_ + ExtraSepAcctCharge_;

...it doesn't matter whether the (invisible) underlying code is PETE,
or some other expression-template library. What matters most is that
  A += B + C;
be written tersely; the improved efficiency is also nice.

I've spent a little time building the capabilities I want, out of the
raw material PETE provides. If another library already provides them,
then perhaps some of that relatively small effort was wasted. But the
important thing is rewriting nasty old lmi code so that it's cleaner:
that's the lion's share of the effort, and it takes as long with one
expression-template library as with another.

> GC> My ambitions are tiny by comparison, but mostly orthogonal to theirs.
> GC> I don't want tensors and all that. I just want to be able to write
> GC> simple operations on scalars and arrays in a simple, natural, and
> GC> (most of all) terse way, e.g.:
> GC> 
> GC>   scalar A, B;
> GC>   vector U, V, W;
> GC>   W = max(0.0, 1.0 + A * U / (B * V - 3.5));
> 
>  FWIW I've found several libraries that allow doing this (and not that much
> more). The problem is that none of them works with std::vector, they all
> define their own bespoke data structures, which makes them unacceptable for
> our needs without any changes.

In lmi's problem domain, vectors are largely sufficient, and arrays
of greater rank are rare. Many libraries target multidimensional
arrays, and std::vector wouldn't work for them.

> GC> I can glance at a terse expression (in my professional domain) and
> GC> see what it does, and (probably) say whether it's right. But endless
> GC> lines of std::transform would puzzle anyone's will, and lambdas with
> GC> ranged-for aren't much more transparent.
> 
>  Unfortunately the new wave in C++(20) is "ranges" and I'm afraid I'm
> almost as unimpressed by them as by std::transform(). Maybe it's an
> acquired taste that will come with time

From what I've seen, I'm not excited.

> GC> I come to expression templates seeking expressiveness, though wanting
> GC> to avoid any serious speed penalty--yet not primarily seeking speed.
> GC> But better speed is nice to have, provided expressiveness is maximized.
> 
>  Ideally expression template library would allow to completely decouple the
> syntax from the implementation, allowing to change the latter without
> affecting the former. This is important because it would allow to easily
> use AVX instructions, for example, that can result in massive performance
> gains.

And if someone does that right, someday, we can switch from PETE to
their library.

>  From this point of view, Boost.YAP[1] library looks promising, as it seems
> to allow exactly this, and I was going to look at it "soon" but just didn't
> have time to do it yet. But, again, I don't know if it's still worth doing
> this if you've already committed to using PETE for the observable future.
> 
> [1]: https://www.boost.org/doc/libs/master/doc/html/yap.html

Looks like PETE to me. Compare:

// Assigns some expression e to the given vector by evaluating e elementwise,
// to avoid temporaries and allocations.
template <typename T, typename Expr>
std::vector<T> & assign (std::vector<T> & vec, Expr const & e)
{
    decltype(auto) expr = boost::yap::as_expr(e);
    assert(equal_sizes(vec.size(), expr));
    for (std::size_t i = 0, size = vec.size(); i < size; ++i) {
        vec[i] = boost::yap::evaluate(
            boost::yap::transform(boost::yap::as_expr(expr), take_nth{i}));
    }
    return vec;
}

template<typename T, typename Op, typename X>
inline void evaluate(std::vector<T>& t, Op const& op, Expression<X> const& x)
{
    // [replaced informative message with 'assert()' here]
    assert(if(!forEach(x, SizeLeaf(lmi::ssize(t)), AndCombine())))
    for(int i = 0; i < lmi::ssize(t); ++i)
        {
        op(t[i], forEach(x, EvalLeaf1(i), OpCombine()));
        }
}

It's six of one, or half a dozen of another.

> GC> I'd be quickly convinced by a library that would let me replace this:
> GC> 
> GC>     std::reverse(chg_sa.begin(), chg_sa.end());
> GC>     std::partial_sum(chg_sa.begin(), chg_sa.end(), chg_sa.begin());
> GC>     std::reverse(chg_sa.begin(), chg_sa.end());
> GC> 
> GC> with the pellucid equivalent in Iverson notation:
> GC> 
> GC>     chg_sa←⌽+\⌽chg_sa
> GC> 
> GC> but I know that's a lot to ask.

commit 3edb81fbdf2

-    std::reverse(chg_sa.begin(), chg_sa.end());
-    std::partial_sum(chg_sa.begin(), chg_sa.end(), chg_sa.begin());
-    std::reverse(chg_sa.begin(), chg_sa.end());
+    chg_sa = back_sum(chg_sa);

Not as nice as APL, but clear and readable.

>  I did see a library called ra-ra which claims to be inspired by APL and,
> naturally, I've added it to the list of candidates to look at just because
> I thought you would be interested. OTOH I probably will be rather relieved
> to never know how does APL implemented in C++ look like...
> 
> [2]: https://github.com/lloda/ra-ra

Interesting.

ra-ra's documentation mentions single-argument reductions:
  any, every, amax, amin, sum, prod
which I just implemented for PETE. I wish I knew what
  "the two-argument reductions dot and cdot"
mean, but the manual is very incomplete, with many FIXME's:
  https://lloda.github.io/ra-ra/
although the code dates back to 2011 apparently, which leads me to
conjecture that he's not maintaining this very actively.

>  To summarize, after looking at some more recent libraries I am even more
> convinced that PETE is very far from being the ideal implementation of the
> ET idea in modern C++

It's probably not even ideal for C++98, as this paper:
  https://www10.cs.fau.de/publications/reports/TechRep_2006-07.pdf
discusses
  "Performance Results on a Pentium 4 using the g++-3.3.3 compiler"
showing how to improve on old-school expression templates. See also:
  
http://www.sci.utah.edu/~wolters/PaperWolters/2010/HaerdtleinEtAl_CompVisSci_2010.pdf

Here's a library:
  https://github.com/wichtounet/etl
that "makes extensive use of C++17 and some features of C++20".

> and it seems clear that things could be much improved
> simply by using C++17-specific features such as if constexpr and fold
> expressions.

"improved" in what sense? I don't think speed is going to matter much.
Taking the "commit 3143a303a6d4" specimen above as an example, I'd
guess that the old std::transform() stuff was slow, and
    A += B + C;
is faster, but the real point is that the sprawling, unreadable
std::transform() code could have been riddled with error that we'd
never find, whereas
    A += B + C;
is comprehensible, and any errors in it are easier to see. That's
the most important improvement.

> But so far I haven't been able to find a drop-in replacement
> for it, and I don't think one exists, so migrating from PETE would clearly
> require some work and it looks like you'd prefer to work on improving lmi
> version of PETE itself instead.

Most of lmi resists vectorization: it consists of
  for(int year = 0; year < 100 - age; ++year)
    for(int month = 0; month < 12; ++month)
       do_something();
where do_something() involves only simple + - * / arithmetic,
but depends on a vast and dense thicket of conditionals, with
rounding at each step. The rounding means that 100 years or
even 12 months cannot be run in parallel on separate cores.
It's embarrassingly serial.

Some smaller though important parts of lmi are vectorizable.
They don't need to be made blindingly fast--just not too slow.
But writing vector code in a scalar-oriented language is
tiresome and error-prone.

For all those various parts of lmi to work together, there
needs to be one shared vector datatype. If that datatype is
to be std::vector, then I don't think we're likely to find
a better library than PETE. We could write one ourselves,
but I don't think we'll find one.

>  This is a pity because it seems unlikely that lmi will ever be able to use
> AVX instructions if we don't use an external library, and I also remain
> certain that this is the real key to qualitative performance gains (of
> course, I've been saying this since 15 years about SSE, then about SSE 2,
> then about AVX, AVX 256 and by now we're at AVX 512 and I still keep saying
> the same thing, so at the very least you will have to grant that I'm
> consistent in my convictions).

If we can make 10% of lmi's code twice as fast, then lmi
runs only 5% faster.

>  Please let me know if you're interested in pursuing this discussion
> further or if we should make another 10+ year break and see how things
> evolve by 2035...

Authors of expression-template libraries seem to want to bring
C++ up to the state of the art for scientific computing that
was set by FORTRAN many decades ago. They'll naturally focus
on all that linear algebra stuff that lmi doesn't need. And
they'll do it with bespoke data structures.

I just want to regain some of the expressiveness of another
language almost as old, APL, where
    A += B + C;
works as expected for std::vectors.
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [lmi] overview of C++ expression template libraries, Vadim Zeitlin, 2021/03/20
- Re: [lmi] overview of C++ expression template libraries, Greg Chicares <=
  - Re: [lmi] overview of C++ expression template libraries, Vadim Zeitlin, 2021/03/22
    - Re: [lmi] overview of C++ expression template libraries, Greg Chicares, 2021/03/22
Prev by Date: [lmi] Minor problem with checking executable files under .git
Next by Date: Re: [lmi] overview of C++ expression template libraries
Previous by thread: Re: [lmi] overview of C++ expression template libraries
Next by thread: Re: [lmi] overview of C++ expression template libraries
Index(es):
- Date
- Thread