[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Test-lock hang (not 100% reproducible) on GNU/Linux
From: |
Bruno Haible |
Subject: |
Re: Test-lock hang (not 100% reproducible) on GNU/Linux |
Date: |
Wed, 04 Jan 2017 15:17:01 +0100 |
User-agent: |
KMail/4.8.5 (Linux/3.8.0-44-generic; KDE/4.8.5; x86_64; ; ) |
Pádraig Brady:
> Now that test-lock.c is relatively fast on numa/multicore systems,
> it seems like it would be useful to first alarm(30) or something
> to protect against infinite hangs?
If we could not pinpoint the origin of the problem, I agree, an alarm(30)
would be the right thing to prevent an infinite hang.
But by now, we know
1) It's a glibc bug: The test [6] fails even after it has set the
policies that POSIX expects for the "writers get the rwlock in preference
to readers guarantee".
2) Without this guarantee, a reader function that repeatedly spends
I milliseconds in a section protected by the rwlock,
O milliseconds without the rwlock being held,
in a system with N reader threads in parallel
will lead to
- a successful termination if N * I / (I + O) < 1.0
- an infinite hang if N * I / (I + O) > 1.0
(There is actually no discontinuity at 1.0; need to use probability
calculus for a more detailed analysis.)
So, in order to make test_rwlock hang-tree, I would need to introduce
a sleep() without the rwlock being held, and the duration of this sleep
would be at least (N - 1) * I.
Now, asking an application writer to add sleep()s in his code, with
a duration that depends both on the number of threads and on the time
spent in specific portions of the code, is outrageous.
So, as it stands, POSIX rwlock without a "writers get preference" guarantee
is unusable.
I propose to do what we usually do in gnulib, to work around bugs and unusable
APIs:
- Write a configure test for the guarantee, based on [6].
- Modify the 'lock' module to use its own implementation of rwlock.
- Add a unit test to verify the guarantee (so that we can also detect
if the same problem occurs in pth or Solaris), again based on [6].
Patch in preparation...
Bruno
[6]
https://github.com/linux-test-project/ltp/blob/master/testcases/open_posix_testsuite/conformance/interfaces/pthread_rwlock_rdlock/2-2.c
rwlock-draft.diff
Description: Text Data
- Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Pavel Raiskup, 2017/01/02
- Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Bruno Haible, 2017/01/02
- Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Pavel Raiskup, 2017/01/02
- Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Bruno Haible, 2017/01/03
- Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Pavel Raiskup, 2017/01/04
- Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Bruno Haible, 2017/01/04
- Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Pádraig Brady, 2017/01/04
- Re: Test-lock hang (not 100% reproducible) on GNU/Linux,
Bruno Haible <=
- Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Pavel Raiskup, 2017/01/04
- Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Bruno Haible, 2017/01/04
- Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Pavel Raiskup, 2017/01/04
- Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Pavel Raiskup, 2017/01/04
- Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Bruno Haible, 2017/01/04
Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Bruno Haible, 2017/01/05