[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Reliability of RPC services
From: |
Marcus Brinkmann |
Subject: |
Re: Reliability of RPC services |
Date: |
Tue, 25 Apr 2006 20:07:19 +0200 |
User-agent: |
Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.7 (Sanjō) APEL/10.6 Emacs/21.4 (i486-pc-linux-gnu) MULE/5.0 (SAKAKI) |
At Tue, 25 Apr 2006 11:45:18 -0600,
"Christopher Nelson" <address@hidden> wrote:
> I like hard real-time systems. I have thought a lot about the recovery
> aspect of system design. To me it seems like you have two situations:
Can you give us some references to prior work on this topic that is
most relevant here? Papers, Thesises, etc.
> This might be extended to IPC by doing something similar. It may not
> ever be necessary to know "when" to stop retrying. It may be possible
> to indicate to a user that the requested operation is taking longer than
> expected, and to give the user the opportunity to cancel the request.
> Other servers (such as a mail server) may have a settings file which
> dictates how "long" it should keep retrying an operation.
>
> In these situations, the metric for timing out may not be some
> compile-time constant, but can be dependent on what the user has said
> should happen. (In the case of a settings file, it is probably a
> "knowledgeable" user, since all servers should come set with reasonable
> defaults.)
>
> One other idea that may not be feasible is in regards to timouts being
> flaky in the case of heavy load. Perhaps it would be better to
> stipulate that the watchdog should keep track of how many requests have
> been processed, and how many are pending. Over time this indicates an
> "average load". If this number starts to rise sharply, the watchdog may
> assume that it is now under a heavier load, and can use some metric to
> back off on it's abort policy. Think about how Ethernet cards use
> binary exponential backoff to make sure only one system is transmitting
> at once, without any explicit session policy.
>
> Essentially, apps and servers need to be smarter and need to expect
> things to go wrong.
My concern is that in a system with such complex dynamics, there may
be emergent behaviour that is totally different from what you actually
want. Your binary exponential backoff is a very good example, as
originally designed it lead to starvation (ethernet capture effect).
Jeff Mogul calls this "emergent misbehaviour", see:
http://www.cs.kuleuven.ac.be/conference/EuroSys2006/papers/p293-mogul.pdf
I really hope that we find a simpler solution, potentially by reducing
the requirements.
Thanks,
Marcus
- Re: Reliability of RPC services, (continued)
Re: Reliability of RPC services, Marcus Brinkmann, 2006/04/23
RE: Reliability of RPC services, Christopher Nelson, 2006/04/25
RE: Reliability of RPC services, Christopher Nelson, 2006/04/25
- Re: Reliability of RPC services,
Marcus Brinkmann <=
RE: Reliability of RPC services, Christopher Nelson, 2006/04/26
RE: Reliability of RPC services, Christopher Nelson, 2006/04/26
RE: Reliability of RPC services, Christopher Nelson, 2006/04/26
RE: Reliability of RPC services, Christopher Nelson, 2006/04/26