[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Reliability of RPC services

From: Christopher Nelson
Subject: RE: Reliability of RPC services
Date: Tue, 25 Apr 2006 09:57:22 -0600

> On Tue, Apr 25, 2006 at 10:03:56AM -0400, Jonathan S. Shapiro wrote:
> > I agree. Also, there is something else that we all agree on: if one 
> > mechanism can handle two problems with acceptable 
> efficiency, it is a 
> > mistake to introduce a second mechanism for the second problem.
> > 
> > So I pose the following test case:
> > ...
> > If we conclude that we need watchdogs for this (or for something 
> > else), then I suggest that kernel-supported capability death notice 
> > (any kind) is unnecessary and should not be implemented.
> I disagree.  Although it seems likely that a watchdog 
> (possibly in the form of the user himself) is needed for 
> servers entering infinite loops, I don't think this is an 
> adequate solution.  

I think that any design which expects the user to notice that something
is wrong *and* to know how to fix it is fundamentally flawed.  Take any
Linux, Windows, or *BSD expert, put them in front of a frozen Hurd
system, and ask them to find the stuck process.  They will be highly
amused.  Extending this to any normal user, and they will either unplug
the thing, or just throw it out the window.

In order for watchdogs (a nice euphemism for timeouts) to be effective,
the designer of the software has to be able to decide on a good metric
for when something is taking too long.  If the designer cannot come up
with a good metric, then that recovery mechanism is not appropriate for
that use case.  It is *never* appropriate to expect an end user to be
the watchdog.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]