[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: caching deadlock solved
Re: caching deadlock solved
Fri, 11 May 2007 01:11:06 +0200 (CEST)
hi Dmitry, all,
On Wed, 9 May 2007, Dmitry Borodaenko wrote:
On 5/8/07, Dmitry Borodaenko <address@hidden> wrote:
Yes, it seems to be more reliable now, but in a way, I was right about
the cache deadlock: there is a race condition there, and the more we
rely on cache, the more likely it is to occur. I've spend better part
of last weekend trying to get rid of it, but it's not that easy to do
that and in the same time to keep Cache#fetch_or_add re-enterable. The
deadlock point is the downgrade from global lock to entry lock in
fetch_or_add: if you release global lock before locking the entry, you
get cache corruption, other way around you get deadlock. Good news is
that I found a way to reproduce this bug more or less consistenly:
just run two functional tests (test/tc_robot.rb) in parallel. Bad news
is that I haven't yet found a way to resolve this, could use some help
Good news: deadlock resolved, see CVS or wait for version
0.6.0.20070509-1 in Debian/experimental. Bad news: it won't work
unless you fix a bug in sync.rb from standard Ruby library (in
/usr/lib/ruby/1.8/ on Debian):
About the "bad news": isn't it possible to override methods defined
elsewhere? E.g. as a temporary measure until the bug fix makes it into...
the next debian stable, add your patches as methods to engine/exceptions.rb,
or make them into a file
engine/sync_bugfix20070509.rb and require it in exceptions.rb ?
i know this would be rather unelegant, and the risk is that it could override
any other future fixes to sync.rb for people who update standard Ruby
from e.g. cvs instead of stable sources.
Maybe do this in debian unstable/testing, but not in experimental?
This patch was submitted to ruby-core:
Until this is included upstream, or at least in the Debian package of
Ruby, we'll have to rely on people looking into README.Debian when
their samizdat-drb-server fails. Yet another reason to write a debconf
warning screen soon...
Or would a bug fix like this go from upstream through to debian stable?
i'm still hoping we'll sometime get to the stage where a sysadmin can type
and it will work "out of the box" like apache...
Anyway, the bug has been found and squashed, which is a Good Thing :).
--- sync.rb.dpkg-dist 2007-05-08 23:35:15.000000000 +0100
+++ sync.rb 2007-05-09 01:04:37.000000000 +0100
@@ -54,6 +54,7 @@
class Err < StandardError
+ Thread.critical = false
fail self, sprintf(self::Message, *opt)
@@ -129,10 +130,10 @@
# locking methods.
def sync_try_lock(mode = EX)
- return unlock if sync_mode == UN
+ return unlock if mode == UN
Thread.critical = true
- ret = sync_try_lock_sub(sync_mode)
+ ret = sync_try_lock_sub(mode)
Thread.critical = false