Re: [lmi] Parallel blues

lmi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Parallel blues

From:	Vadim Zeitlin
Subject:	Re: [lmi] Parallel blues
Date:	Tue, 20 Jun 2017 19:09:21 +0200

On Tue, 20 Jun 2017 16:01:08 +0000 Greg Chicares <address@hidden> wrote:

GC> On 2016-07-18 19:55, Vadim Zeitlin wrote:
GC> > 
GC> >  So I tried to parallelize test_coding_rules. As expected, doing it turned
GC> > out to be very simple and it took me about 30 minutes to write the code
GC> 
GC> I'd like to do something about calendar_date_test, because it takes
GC> eleven seconds to run,

 This surprised me, as I've never noticed it taking so long. So I've just
rerun it on my (6+ years old) i7-2600 Linux box and it took 3.4 seconds
here. I might have an answer explaining why is it so much slower for you,
even though your machine is supposed to be much faster than mine, below...


GC> while no other i686 lmi unit test takes more
GC> than six seconds...and, since I run them in parallel, if I could run
GC> this one test in half the time, then I could run the whole unit-test
GC> suite in half the time. I'm considering two approaches, and it seems
GC> like one must be far better than the other, but I'm not sure which
GC> is which, and hope you have a clearer opinion than I.

 If there is one thing I've never lacked, it's clear, strong (and some
would say mistaken) opinions about a lot of things. And in this case I
definitely have one: I prefer (1). But see below.

GC> (1) Parallelize it. TestBirthdateLimitsExhaustively() is called three
GC> times, with different arguments, and each of those three runs takes
GC> two and a half seconds, so it seems natural to run each in its own
GC> thread.

 This is a simple and natural solution. It's a pity this can't be done
automatically (more about this further down), but doing it manually is
simple enough and you could either run each test in its own thread in
CalendarDateTest::Test() or, maybe, use 4 threads: one for each of
TestBirthdateLimitsExhaustively() calls and another one for all the other
threads, as this should be slightly more efficient if we know that the
other tests don't take a long time to run and so will finish before the
exhaustive tests anyhow.

GC> This would be my first attempt at writing multithreaded code
GC> myself, so it has great educational value.

 I started writing that I was afraid it was too simple for this and even
wrote some trivial code parallelizing this test, here it is with "-w" diff
option to suppress changes in indentation that would normally go with it:
---------------------------------- >8 --------------------------------------
diff --git a/calendar_date_test.cpp b/calendar_date_test.cpp
index 02d5dd63e..fb1cec86d 100644
--- a/calendar_date_test.cpp
+++ b/calendar_date_test.cpp
@@ -29,6 +29,7 @@
 #include "timer.hpp"

 #include <sstream>
+#include <thread>

 // Function TestDateConversions() in 'wx_utility.cpp' validates the
 // mapping between {year, month, day} triplets and JDN against the
@@ -39,6 +40,8 @@ struct CalendarDateTest
 {
     static void Test()
         {
+        std::thread t1([]()
+            {
             TestFundamentals();
             TestAlgorithm199Bounds();
             TestYMDBounds();
@@ -49,11 +52,26 @@ struct CalendarDateTest
             TestIntegralDuration();
             TestYearAndMonthDifferenceExhaustively();
             TestBirthdateLimits();
+            TestIo();
+            TestSpeed();
+            });
+        std::thread t2([]()
+            {
             TestBirthdateLimitsExhaustively(oe_age_last_birthday);
+            });
+        std::thread t3([]()
+            {
             
TestBirthdateLimitsExhaustively(oe_age_nearest_birthday_ties_younger);
+            });
+        std::thread t4([]()
+            {
             
TestBirthdateLimitsExhaustively(oe_age_nearest_birthday_ties_older);
-        TestIo();
-        TestSpeed();
+            });
+
+        t1.join();
+        t2.join();
+        t3.join();
+        t4.join();
         }

     static void TestFundamentals();
---------------------------------- >8 --------------------------------------

 As you can see, the code is indeed trivial and it even compiled from the
first try.

 Unfortunately, it doesn't work and it took me slightly longer to find out
why does it crash when I run it. It turns out, that test code calls
{min,max}imum_birthdate(), which calls birthdate_limit::operator(), which
calls decimal_root() which outputs some (debugging?) messages to the
provided iteration_stream which is by default just null_stream(). And
null_stream() returns a reference to a static object, which ends up being
used from multiple threads at once -- hence the crash. I'm not sure what
would be the best way to fix it properly, for now I've simply commented out
the use of this stream. And I also had to apply another fix to suppress the
last error from thread sanitizer (how did we live without it before?):
---------------------------------- >8 --------------------------------------
diff --git a/test_main.cpp b/test_main.cpp
index dca5894b3..79e4cc5b4 100644
--- a/test_main.cpp
+++ b/test_main.cpp
@@ -63,6 +63,7 @@
 #include "miscellany.hpp"               // stifle_warning_for_unused_value()
 #include "test_tools.hpp"

+#include <atomic>
 #include <iostream>
 #include <ostream>
 #include <regex>
@@ -75,8 +76,8 @@
 {
   namespace test
   {
-    int test_tools_errors = 0;  // Count of errors detected.
-    int test_tools_successes = 0;  // Count of successful tests.
+    std::atomic<int> test_tools_errors{0};  // Count of errors detected.
+    std::atomic<int> test_tools_successes{0};  // Count of successful tests.

     class test_tools_exception : public std::runtime_error
     {
---------------------------------- >8 --------------------------------------
which, I think, is pretty self-explanatory: if we run tests from more than
one thread, we can't increment test_tools_xxx variables if they're plain
ints, so we need to either use a mutex around them or, simpler and more
efficiently, make them atomic.

 After making all these changes, I could finally run the test and it did
run much faster: 0.7 seconds only. However, it was a bit too early to
declare victory because I didn't forget that I disabled the use of
iteration_stream. So I've retried running the test *without* threading, but
also without iteration_stream and the result was 0.9 seconds. Hence it
seems that most of the time spent by this test is actually wasted writing
to this useless stream and, perhaps, it's even worse with MinGW standard
library implementation, which could explain why it takes so long for you.

 Finally, my recommendation would be to test whether the test runs
acceptably fast for you without tracing. Maybe it does, and then you don't
need to bother with threading at all. Also, if null_stream() is really the
reason for the problem, it would seem better to get rid of it entirely: it
could be a relatively nice abstraction, but its cost is too high compared
to the low-tech solution of using "std::ostream* s = nullptr".


 To summarize, this was quite educational, after all, although for
completely different reasons -- and I don't think you can extract any more
educational value from it from the point of view of writing MT code in
C++11. What would still be very educational and also useful, would be to
have a testing framework that would run different tests in parallel
automatically. Unfortunately, neither (the real) Boost.Test nor my personal
favourite CATCH support this and lmi testing framework is too threadbare to
have any chance of implementing it, because it doesn't even support
registering tests automatically. If we were serious about speeding up the
tests, I'd strongly consider implementing test registration and running
them in parallel using a thread pool. But it would be a big project with
relatively modest benefits, so I don't think we're actually going to do
this in the observable future.


GC> However, I'd be running sixty-three unit tests via parallel make, of
GC> which one would be running in parallel via multithreading, and is that
GC> combination somehow unholy?

 No, not at all. It may be slightly suboptimal, but if this tests runs much
longer than the others, it would still be a gain.


GC> (2) Split calendar_date_test into several separate tests, and let
GC> make handle all the parallelism. This wouldn't be educational, but
GC> I'd be more likely to get it right the first time, with less effort.
GC> However, it seems unnatural to split a unit test into pieces.

 Yes, absolutely. I think tests should be organized logically and not
depending on the most efficient way to run them (unless the inefficient way
is really unacceptable in practice and there is no other choice to speed it
up). In particular, if I make any changes to calendar_date class, I expect
to be able to run (only) calendar_date_test to confirm that I didn't break
anything, having to run multiple tests would be unexpected.

GC> Is one way clearly better?

 To return to my initially held opinion: (2) is ugly and I'd avoid it if
possible. (1) is not ideal, but perfectly acceptable, IMO, and fixing
MT-safety problems uncovered by it is good hygiene anyhow.

 Regards,
VZ

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [lmi] Parallel blues, Greg Chicares, 2017/06/20
- Re: [lmi] Parallel blues, Vadim Zeitlin <=
  - Re: [lmi] Parallel blues, Greg Chicares, 2017/06/20
    - Re: [lmi] Parallel blues, Vadim Zeitlin, 2017/06/20
    - Re: [lmi] Parallel blues, Greg Chicares, 2017/06/21
    - Re: [lmi] Parallel blues, Vadim Zeitlin, 2017/06/21
    - Re: [lmi] Parallel blues, Greg Chicares, 2017/06/22
    - Re: [lmi] Parallel blues, Vadim Zeitlin, 2017/06/22
    - Re: [lmi] Parallel blues, Greg Chicares, 2017/06/27
    - Re: [lmi] Parallel blues, Vadim Zeitlin, 2017/06/27

Prev by Date: Re: [lmi] Parallel blues
Next by Date: Re: [lmi] Parallel blues
Previous by thread: Re: [lmi] Parallel blues
Next by thread: Re: [lmi] Parallel blues
Index(es):
- Date
- Thread