bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Test-lock hang (not 100% reproducible) on GNU/Linux


From: Bruno Haible
Subject: Re: Test-lock hang (not 100% reproducible) on GNU/Linux
Date: Wed, 04 Jan 2017 15:17:01 +0100
User-agent: KMail/4.8.5 (Linux/3.8.0-44-generic; KDE/4.8.5; x86_64; ; )

Pádraig Brady:
> Now that test-lock.c is relatively fast on numa/multicore systems,
> it seems like it would be useful to first alarm(30) or something
> to protect against infinite hangs?

If we could not pinpoint the origin of the problem, I agree, an alarm(30)
would be the right thing to prevent an infinite hang.

But by now, we know

1) It's a glibc bug: The test [6] fails even after it has set the
   policies that POSIX expects for the "writers get the rwlock in preference
   to readers guarantee".

2) Without this guarantee, a reader function that repeatedly spends
     I milliseconds in a section protected by the rwlock,
     O milliseconds without the rwlock being held,
   in a system with N reader threads in parallel
   will lead to
     - a successful termination if   N * I / (I + O) < 1.0
     - an infinite hang if           N * I / (I + O) > 1.0
   (There is actually no discontinuity at 1.0; need to use probability
   calculus for a more detailed analysis.)
   So, in order to make test_rwlock hang-tree, I would need to introduce
   a sleep() without the rwlock being held, and the duration of this sleep
   would be at least (N - 1) * I.

   Now, asking an application writer to add sleep()s in his code, with
   a duration that depends both on the number of threads and on the time
   spent in specific portions of the code, is outrageous.

   So, as it stands, POSIX rwlock without a "writers get preference" guarantee
   is unusable.

I propose to do what we usually do in gnulib, to work around bugs and unusable
APIs:
  - Write a configure test for the guarantee, based on [6].
  - Modify the 'lock' module to use its own implementation of rwlock.
  - Add a unit test to verify the guarantee (so that we can also detect
    if the same problem occurs in pth or Solaris), again based on [6].

Patch in preparation...

Bruno

[6] 
https://github.com/linux-test-project/ltp/blob/master/testcases/open_posix_testsuite/conformance/interfaces/pthread_rwlock_rdlock/2-2.c

Attachment: rwlock-draft.diff
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]