[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: caching deadlock solved

From: boud
Subject: Re: caching deadlock solved
Date: Fri, 11 May 2007 01:11:06 +0200 (CEST)

hi Dmitry, all,

On Wed, 9 May 2007, Dmitry Borodaenko wrote:

On 5/8/07, Dmitry Borodaenko <address@hidden> wrote:
Yes, it seems to be more reliable now, but in a way, I was right about
the cache deadlock: there is a race condition there, and the more we
rely on cache, the more likely it is to occur. I've spend better part
of last weekend trying to get rid of it, but it's not that easy to do
that and in the same time to keep Cache#fetch_or_add re-enterable. The
deadlock point is the downgrade from global lock to entry lock in
fetch_or_add: if you release global lock before locking the entry, you
get cache corruption, other way around you get deadlock. Good news is
that I found a way to reproduce this bug more or less consistenly:
just run two functional tests (test/tc_robot.rb) in parallel. Bad news
is that I haven't yet found a way to resolve this, could use some help

Good news: deadlock resolved, see CVS or wait for version in Debian/experimental. Bad news: it won't work
unless you fix a bug in sync.rb from standard Ruby library (in
/usr/lib/ruby/1.8/ on Debian):

Congratulations :).

About the "bad news": isn't it possible to override methods defined
elsewhere? E.g. as a temporary measure until the bug fix makes it into...
the next debian stable, add your patches as methods to  engine/exceptions.rb,
or make them into a file
 engine/sync_bugfix20070509.rb  and require it in exceptions.rb ?

i know this would be rather unelegant, and the risk is that it could override any other future fixes to sync.rb for people who update standard Ruby from e.g. cvs instead of stable sources.

Maybe do this in debian unstable/testing, but not in experimental?

This patch was submitted to ruby-core:

Until this is included upstream, or at least in the Debian package of
Ruby, we'll have to rely on people looking into README.Debian when
their samizdat-drb-server fails. Yet another reason to write a debconf
warning screen soon...

Or would a bug fix like this go from upstream through to debian stable?

i'm still hoping we'll sometime get to the stage where a sysadmin can type

 aptitude samizdat

and it will work "out of the box" like apache...

Anyway, the bug has been found and squashed, which is a Good Thing :).


--- sync.rb.dpkg-dist   2007-05-08 23:35:15.000000000 +0100
+++ sync.rb     2007-05-09 01:04:37.000000000 +0100
@@ -54,6 +54,7 @@
 # exceptions
 class Err < StandardError
   def Err.Fail(*opt)
+      Thread.critical = false
     fail self, sprintf(self::Message, *opt)

@@ -129,10 +130,10 @@

 # locking methods.
 def sync_try_lock(mode = EX)
-    return unlock if sync_mode == UN
+    return unlock if mode == UN

   Thread.critical = true
-    ret = sync_try_lock_sub(sync_mode)
+    ret = sync_try_lock_sub(mode)
   Thread.critical = false

reply via email to

[Prev in Thread] Current Thread [Next in Thread]