Re: [lmi] Parallel blues

lmi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Parallel blues

From:	Greg Chicares
Subject:	Re: [lmi] Parallel blues
Date:	Tue, 20 Jun 2017 23:10:03 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

On 2017-06-20 17:09, Vadim Zeitlin wrote:
> On Tue, 20 Jun 2017 16:01:08 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> On 2016-07-18 19:55, Vadim Zeitlin wrote:
> GC> > 
> GC> >  So I tried to parallelize test_coding_rules. As expected, doing it 
> turned
> GC> > out to be very simple and it took me about 30 minutes to write the code
> GC> 
> GC> I'd like to do something about calendar_date_test, because it takes
> GC> eleven seconds to run,
> 
>  This surprised me, as I've never noticed it taking so long. So I've just
> rerun it on my (6+ years old) i7-2600 Linux box and it took 3.4 seconds
> here. I might have an answer explaining why is it so much slower for you,
> even though your machine is supposed to be much faster than mine, below...

At first I thought you must be using a different compiler. But I
happen to have a native ELF version already built:

file ../build/lmi/Linux/gcc/native/calendar_date_test   
../build/lmi/Linux/gcc/native/calendar_date_test: ELF 64-bit LSB executable, 
x86-64, version 1 (GNU/Linux), dynamically linked, interpreter 
/lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, 
BuildID[sha1]=49585e26624344a5864797b78faa16f9edbd9e91, not stripped

time ../build/lmi/Linux/gcc/native/calendar_date_test -a
...
.... 109927 tests succeeded

!!!! no errors detected
../build/lmi/Linux/gcc/native/calendar_date_test -a  3.96s user 0.02s system 
99% cpu 3.985 total

...so I guess the timing you report might be for gcc after all.

I'm not sure my machine is at all faster. I have two "E5-2630 v3"
CPUs, but for this measurement it wouldn't matter if I had 2^10
CPUs because we're comparing single-core performance--for which
this page:
  http://cpuboss.com/cpus/Intel-Xeon-E5-2630-v3-vs-Intel-Core-i7-2600
rates your machine 9.6, and mine 6.7 .

> GC> while no other i686 lmi unit test takes more
> GC> than six seconds...and, since I run them in parallel, if I could run
> GC> this one test in half the time, then I could run the whole unit-test
> GC> suite in half the time. I'm considering two approaches, and it seems
> GC> like one must be far better than the other, but I'm not sure which
> GC> is which, and hope you have a clearer opinion than I.
> 
>  If there is one thing I've never lacked, it's clear, strong (and some
> would say mistaken) opinions about a lot of things. And in this case I
> definitely have one: I prefer (1). But see below.

Thanks. Option (2) seemed so 1990s.

> GC> This would be my first attempt at writing multithreaded code
> GC> myself, so it has great educational value.
> 
>  I started writing that I was afraid it was too simple for this and even
> wrote some trivial code parallelizing this test, here it is with "-w" diff
> option to suppress changes in indentation that would normally go with it:

[...quoting just one part to highlight how easy it is...]

> +        std::thread t2([]()
> +            {
>              TestBirthdateLimitsExhaustively(oe_age_last_birthday);
> +            });

>  As you can see, the code is indeed trivial and it even compiled from the
> first try.

That's very cool. Thanks.

>  Unfortunately, it doesn't work and it took me slightly longer to find out
> why does it crash when I run it. It turns out, that test code calls
> {min,max}imum_birthdate(), which calls birthdate_limit::operator(), which
> calls decimal_root() which outputs some (debugging?) messages to the
> provided iteration_stream which is by default just null_stream(). And
> null_stream() returns a reference to a static object, which ends up being
> used from multiple threads at once -- hence the crash. I'm not sure what
> would be the best way to fix it properly, for now I've simply commented out
> the use of this stream. And I also had to apply another fix to suppress the
> last error from thread sanitizer (how did we live without it before?):
> ---------------------------------- >8 --------------------------------------
> diff --git a/test_main.cpp b/test_main.cpp
> index dca5894b3..79e4cc5b4 100644
> --- a/test_main.cpp
> +++ b/test_main.cpp
> @@ -63,6 +63,7 @@
>  #include "miscellany.hpp"               // stifle_warning_for_unused_value()
>  #include "test_tools.hpp"
> 
> +#include <atomic>
>  #include <iostream>
>  #include <ostream>
>  #include <regex>
> @@ -75,8 +76,8 @@
>  {
>    namespace test
>    {
> -    int test_tools_errors = 0;  // Count of errors detected.
> -    int test_tools_successes = 0;  // Count of successful tests.
> +    std::atomic<int> test_tools_errors{0};  // Count of errors detected.
> +    std::atomic<int> test_tools_successes{0};  // Count of successful tests.
> 
>      class test_tools_exception : public std::runtime_error
>      {
> ---------------------------------- >8 --------------------------------------
> which, I think, is pretty self-explanatory: if we run tests from more than
> one thread, we can't increment test_tools_xxx variables if they're plain
> ints, so we need to either use a mutex around them or, simpler and more
> efficiently, make them atomic.

[std::atomic discussed below.] You've diagnosed the real problem:

>  After making all these changes, I could finally run the test and it did
> run much faster: 0.7 seconds only. However, it was a bit too early to
> declare victory because I didn't forget that I disabled the use of
> iteration_stream. So I've retried running the test *without* threading, but
> also without iteration_stream and the result was 0.9 seconds. Hence it
> seems that most of the time spent by this test is actually wasted writing
> to this useless stream and, perhaps, it's even worse with MinGW standard
> library implementation, which could explain why it takes so long for you.

gcc-4.9.x, i686, HEAD:

time make $coefficiency unit_tests 2>&1 | tee >(grep '\*\*\*') >(grep '????') 
>(grep '!!!!' --count | xargs printf "%d tests succeeded\n") >../log
63 tests succeeded
make $coefficiency unit_tests 2>&1  34.57s user 5.76s system 357% cpu 11.293 
total
[command repeated]
make $coefficiency unit_tests 2>&1  34.59s user 5.75s system 356% cpu 11.300 
total

same, but with iteration_stream #ifdef'd out in 'calendar_date_test.cpp', three 
runs:

make $coefficiency unit_tests 2>&1  27.88s user 5.84s system 484% cpu 6.956 
total
make $coefficiency unit_tests 2>&1  27.19s user 5.41s system 529% cpu 6.159 
total
make $coefficiency unit_tests 2>&1  26.89s user 5.79s system 507% cpu 6.444 
total

...and now each TestBirthdateLimitsExhaustively() call takes about
one-sixth of a second instead of two and a half seconds, so that
function runs fifteen times faster.

>  Finally, my recommendation would be to test whether the test runs
> acceptably fast for you without tracing. Maybe it does, and then you don't
> need to bother with threading at all. Also, if null_stream() is really the
> reason for the problem, it would seem better to get rid of it entirely: it
> could be a relatively nice abstraction, but its cost is too high compared
> to the low-tech solution of using "std::ostream* s = nullptr".

I really do want to preserve the possibility of writing "idiosyncrasyT"
in the "Comments" input field in order to see decimal_root()'s iterands.
But I don't want to pay this much for it. I'll find a way that doesn't
involve pointers.

>  To summarize, this was quite educational, after all, although for
> completely different reasons -- and I don't think you can extract any more
> educational value from it from the point of view of writing MT code in
> C++11. What would still be very educational and also useful, would be to
> have a testing framework that would run different tests in parallel
> automatically. Unfortunately, neither (the real) Boost.Test nor my personal
> favourite CATCH support this

With a name like that, I thought I'd never find it, but here it is:
  https://github.com/philsquared/Catch
At a casual glance, it looks good, though I'm not looking for another
unit-testing framework at the moment.

> and lmi testing framework is too threadbare to
> have any chance of implementing it, because it doesn't even support
> registering tests automatically. If we were serious about speeding up the
> tests, I'd strongly consider implementing test registration and running
> them in parallel using a thread pool. But it would be a big project with
> relatively modest benefits, so I don't think we're actually going to do
> this in the observable future.

Agreed.

> GC> (2) Split calendar_date_test into several separate tests
[...]
>  To return to my initially held opinion: (2) is ugly and I'd avoid it if
> possible. (1) is not ideal, but perfectly acceptable, IMO, and fixing
> MT-safety problems uncovered by it is good hygiene anyhow.

Okay, thanks, adding std::atomic to 'test_main.cpp' can't hurt; and,
even if it did, it wouldn't affect the production system.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [lmi] Parallel blues, Greg Chicares, 2017/06/20
- Re: [lmi] Parallel blues, Vadim Zeitlin, 2017/06/20
  - Re: [lmi] Parallel blues, Greg Chicares <=
    - Re: [lmi] Parallel blues, Vadim Zeitlin, 2017/06/20
    - Re: [lmi] Parallel blues, Greg Chicares, 2017/06/21
    - Re: [lmi] Parallel blues, Vadim Zeitlin, 2017/06/21
    - Re: [lmi] Parallel blues, Greg Chicares, 2017/06/22
    - Re: [lmi] Parallel blues, Vadim Zeitlin, 2017/06/22
    - Re: [lmi] Parallel blues, Greg Chicares, 2017/06/27
    - Re: [lmi] Parallel blues, Vadim Zeitlin, 2017/06/27

Prev by Date: Re: [lmi] Parallel blues
Next by Date: Re: [lmi] Parallel blues
Previous by thread: Re: [lmi] Parallel blues
Next by thread: Re: [lmi] Parallel blues
Index(es):
- Date
- Thread