sks-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sks-devel] Corrupt PTree (was: Extended Downtime key.ip6.li)


From: Jeffrey Johnson
Subject: Re: [Sks-devel] Corrupt PTree (was: Extended Downtime key.ip6.li)
Date: Fri, 27 Jan 2012 12:38:20 -0500

On Jan 27, 2012, at 7:27 AM, Javier Henderson wrote:

> 
> On Jan 27, 2012, at 2:29 AM, Christian Felsing wrote:
> 
>> 2012-01-27 08:19:28 Raising Sys.Break -- PTree may be corrupted: 
>> Failure("remove_from_node: attempt to delete non-existant element
>> from prefix tree")
> 
> I see this about once a month. What causes it? Is there some tweaking that 
> prevents it?
> 

What causes this issue is (in my experience) a self-deadlock while
updating the information in a PTree store.

(there's another configuration related issue if you do not have
DB_CONFIG in place to specify the locking resources: my
comments will be about a suspected deeper design flaw).

When the "corruption" happens, there is a postmortem possible:
        cd .../PTree
        db_stat -CA
(all that is really needed is -Cl: -CA will supply all of the information;
what needs to be analyzed is the locks that are held).

WHat needs to be verified (for the deeper design flaw) is whether
a deadlock exists. The information about what locks are held is
in the db_stat output.

What I am calling a "suspected deeper design flaw" is this:

Locking in Berekeley DB is per-page, not per-record.

The PTree store is recursive (afaik) and so the recursion
will occasionally find itself re-accessing an already locked
page that contains two keys needed for the recursion.

The incidence is quite low because in most cases the keys
needed in the PTree recursion are spread across multiple pages.

I predict that the incidence of the deadlock would change
if the pagesize is increased (and that prediction is also
consistent with some performance tuning observations
that should be in archives a couple of months ago).

(all of the above is just hypothesis based on experience with
Berkeley DB. I do not have enough experience with the PTree
store to fully identify what I believe is the mechanism).

Meanwhile the "work around" isn't too painful and the incidence
of "about once a month" can be prevented with a cron script
or a watchdog.

hth

73 de Jeff
> -jav
> 
> 
> _______________________________________________
> Sks-devel mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/sks-devel




reply via email to

[Prev in Thread] Current Thread [Next in Thread]