[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Test-lock hang (not 100% reproducible) on GNU/Linux
From: |
Bruno Haible |
Subject: |
Re: Test-lock hang (not 100% reproducible) on GNU/Linux |
Date: |
Sat, 24 Dec 2016 18:52:07 +0100 (CET) |
Hi Pádraig,
> Wow that's much better on a 40 core system:
>
> Before your patch:
> =================
> $ time ./test-lock
> Starting test_lock ... OK
> Starting test_rwlock ... OK
> Starting test_recursive_lock ... OK
> Starting test_once ... OK
>
> real 1m32.547s
> user 1m32.455s
> sys 13m21.532s
>
> After your patch:
> =================
> $ time ./test-lock
> Starting test_lock ... OK
> Starting test_rwlock ... OK
> Starting test_recursive_lock ... OK
> Starting test_once ... OK
>
> real 0m3.364s
> user 0m3.087s
> sys 0m25.477s
Wow, a 30x speed increase by using a lock instead of 'volatile'!
Thanks for the testing. I cleaned up the patch to do less
code duplication and pushed it.
Still, I wonder about the cause of this speed difference.
It must be the read from the 'volatile' variable that is problematic,
because the program writes to 'volatile' variable only 6 times in total.
What happens when a program reads from a 'volatile' variable
at address xy in a multi-processor system? It must do a broadcast
to all other CPUs "please flush your internal write caches", wait
for these flushes to be completed, and then do a read at address xy.
But the same procedure must also happen when taking a lock at
address xy. So, where does the speed difference come from?
The 'volatile' handling must be implemented in a terrible way;
either GCC generates inefficient instructions? or these instructions
are executed in a horrible way by the hardware?
What is the hardware of your 40-core machine (just for reference)?
Bruno
0001-lock-test-Fix-performance-problem-on-multi-core-mach.patch
Description: Binary data