lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] overview of C++ expression template libraries


From: Greg Chicares
Subject: Re: [lmi] overview of C++ expression template libraries
Date: Mon, 22 Mar 2021 20:12:59 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0

On 3/22/21 11:39 AM, Vadim Zeitlin wrote:
> On Mon, 22 Mar 2021 02:58:52 +0000 Greg Chicares <gchicares@sbcglobal.net> 
> wrote:
> 
> GC> In lmi's problem domain, vectors are largely sufficient, and arrays
> GC> of greater rank are rare. Many libraries target multidimensional
> GC> arrays, and std::vector wouldn't work for them.
> 
>  Yes, this is definitely a good point. It still seems like it ought to be
> possible to make the same logic work for std::vector<>, but somehow nobody
> seems to have ever been interested in this.

Isn't that what the "views" concept is supposed to do? I haven't studied
it in great depth, but here's my impression. Suppose we want to sort all
but the last five elements of a vector, backwards. The old way is:
  std::sort(v.rbegin() + 5, v.rend());
but with views the new way is:
  std::ranges::sort(std::views::drop(std::views::reverse(v), 5));
Thus, views mediate between container and algorithm: the old way is:
  algorithm(container.iterator0(), container.iterator1());
while the new way is
  view(container.iterator0(), container.iterator1());
  algorithm(view);
Having a separate View object means we can adjust its meaning, e.g. with
drop() and reverse() above, and apply more than one algorithm to it:
"all but the last five, backwards" is a coherent and useful abstraction.

And, notionally at least, "view" are just sets of pointers (iterators):
they're inexpensive and lightweight, and they don't own the data they
point to. Thus, wouldn't they work just as well for one Container type
as another? and, if they serve to decouple a set of indices from data
storage, wouldn't a View-based expression template library work
naturally and automatically with any Container? That is, even if we
have an NxM BespokeArrayOfDouble, then we ought to be able to form a
"view" to its Nx1 kth column and "materialize" that as a std::vector;
and if the BespokeThing is a temporary, we ought to be able to move
its contents into a std::vector quite cheaply.

> GC> >  From this point of view, Boost.YAP[1] library looks promising, as it 
> seems
> GC> > to allow exactly this, and I was going to look at it "soon" but just 
> didn't
> GC> > have time to do it yet. But, again, I don't know if it's still worth 
> doing
> GC> > this if you've already committed to using PETE for the observable 
> future.
> GC> > 
> GC> > [1]: https://www.boost.org/doc/libs/master/doc/html/yap.html
> GC> 
> GC> Looks like PETE to me. Compare:
> GC> 
> GC> // Assigns some expression e to the given vector by evaluating e 
> elementwise,
> GC> // to avoid temporaries and allocations.
> GC> template <typename T, typename Expr>
> GC> std::vector<T> & assign (std::vector<T> & vec, Expr const & e)
> GC> {
> GC>     decltype(auto) expr = boost::yap::as_expr(e);
> GC>     assert(equal_sizes(vec.size(), expr));
> GC>     for (std::size_t i = 0, size = vec.size(); i < size; ++i) {
> GC>         vec[i] = boost::yap::evaluate(
> GC>             boost::yap::transform(boost::yap::as_expr(expr), 
> take_nth{i}));
> GC>     }
> GC>     return vec;
> GC> }
> 
>  But this is part of the _implementation_, not how you use the library.
> I.e. you're supposed to write this once

Just as with PETE, where you're supposed to write evaluate() once:

/opt/lmi/src/lmi/tools/pete-2.1.1/html/tut-2.html
| Most of the definitions required to integrate vector<> with PETE
| are generated automatically [...] The file Eval.h contains the
| few extra definitions that must be written by hand. Of these,
| the most important is the function evaluate()

> but then, as the example from which
> the above was taken shows, you can just do
> 
>       std::vector<int> a,b,c,d;
>       std::vector<double> e(n);
> 
>       [...]
> 
>       // After this point, no allocations occur.
> 
>       assign(b, 2);
>       assign(d, a + b * c);

But isn't that just the same as PETE?

>       a += if_else(d < 30, b, c);

I suppose that's similar to PETE's where().

>       assign(e, c);
>       e += e - 4 / (c + 1);
>  
>  I'm not sure why do we need to use assign() instead of just the assignment
> operator, but this still looks reasonably clear to me.

I think it's because an assignment operator has to be a member,
and boost::yap can't add such an operator to std::vector. They
could have used something debatable like overloading operator <<=().

>  But of course, it's similar to PETE conceptually, but there is nothing
> really wrong with this, is there?

Nothing wrong with it. Conceptually.

But it's part of Boost, and my experience with Boost is that it's
a tightly integrated, monolithic set of libraries, of varying
quality--and either we take the whole thing, with its weird build
system and variable quality, or we just say no. As a standalone
library, 'yap' might be all right.

> GC> Here's a library:
> GC>   https://github.com/wichtounet/etl
> GC> that "makes extensive use of C++17 and some features of C++20".
> GC> 
> GC> > and it seems clear that things could be much improved
> GC> > simply by using C++17-specific features such as if constexpr and fold
> GC> > expressions.
> GC> 
> GC> "improved" in what sense?
> 
>  In the sense of making the library implementation simpler and more
> efficient. In fact, the author of the ETL, linked above, wrote a couple
> of posts explaining the advantages of migrating his library C++17. I admit
> not have followed all the details, but I assume he knows what he's speaking
> about...

But what measurable benefit can we observe, to offset the cost of
switching? I think the only difference will be the speed of the
relatively small part of lmi that's parallelizable.

> GC> Most of lmi resists vectorization: it consists of
> GC>   for(int year = 0; year < 100 - age; ++year)
> GC>     for(int month = 0; month < 12; ++month)
> GC>        do_something();
> GC> where do_something() involves only simple + - * / arithmetic,
> GC> but depends on a vast and dense thicket of conditionals, with
> GC> rounding at each step. The rounding means that 100 years or
> GC> even 12 months cannot be run in parallel on separate cores.
> GC> It's embarrassingly serial.
> 
>  But couldn't we still process 12 months in a single loop iteration,
> operating on all 12 months in parallel? This should be possible with AVX
> instructions.

No, we can't. Consider this extremely simplified example:

  double v = 0.03;
  double z = 12345.67;
  for(int j = 0; j < 12; ++j)
    {z += round_to_nearest_hundredth(z * v);}

Rounding at each step prevents parallelization.

> GC> For all those various parts of lmi to work together, there
> GC> needs to be one shared vector datatype. If that datatype is
> GC> to be std::vector, then I don't think we're likely to find
> GC> a better library than PETE. We could write one ourselves,
> GC> but I don't think we'll find one.
> 
>  As I wrote, I tend to agree, with Boost.YAP being the only library I've
> found that really decouples the ET logic from the actual data
> representation.

Ah, the Views concept, IIUC.

> This comes at the cost of needing to write things like
> assign() above ourselves, but, in principle, this might not be too bad.

Agreed.

> GC> If we can make 10% of lmi's code twice as fast, then lmi
> GC> runs only 5% faster.
> 
>  I do agree with the idea, but I'd still like to say that the gains from
> using AVX may be significantly higher than "only" 100%.

If we can make 10% of lmi's code run infinitely faster,
then lmi runs only 10% faster.

If that means we have to be saddled with boost forever,
and subject to all its problems, then I'm pretty much
convinced that the benefit cannot be worth that cost.

> GC> I just want to regain some of the expressiveness of another
> GC> language almost as old, APL, where
> GC>     A += B + C;
> GC> works as expected for std::vectors.
> 
>  Yes, and it really surprises me that there is no existing library allowing
> to do this. Perhaps I'm missing something here and it's not as simple (or
> as useful?) as I think it would be to write one.

It surprises me, too. How can this be?

It's not that it's difficult. PETE is actually quite simple; it's
all covered in a turn-of-the-century DDJ article:
  https://www.drdobbs.com/cpp/pete-the-portable-expression-template-en/184411075
and requires only C++98.

It's not that no one considers it useful: there are plenty of
expression-template libraries, and std::valarray doesn't make
much sense without expression templates. But its champions are
few, and their interest is finite. PETE's authors published their
work in DDJ, and then presumably turned their attention back to
physics. Veldhuizen did a very impressive job with blitz++, and
his documentation was extensive and polished, but then he walked
away. Legend has it that whoever began the development of valarray
left before the work was finished.

And the ISO committees work only on what interests them. It's too
bad they spent time on, say, exported templates, and on modules,
instead of standardizing some expression-template library. I get
the impression that the python community welcomed numerical work
more than the C++ committees.

Meanwhile, scientific programmers can still use FORTRAN.

And the C++ masses lack array-consciousness. Anyone who's written
APL sees the limitations, but that's not many of us.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]