[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Thu, 12 Jan 2006 14:18:16 +0100
Gnus/5.110004 (No Gnus v0.4) Emacs/21.4 (gnu/linux)
Bas Wijnen <address@hidden> writes:
> I am indeed. :-) In some of my limited free time I'm currently trying to
> write a library for persistent applications, so I get a better feeling for
> what persistence is. The idea is to create applications which checkpoint
> themselves every now and then, can be aborted at any time (or made to
> checkpoint-and-abort as an atomic operation), and when restarted continue from
> their last checkpoint. I'm trying to design the database in a way that it is
> possible to "restart" the program with a new version of it, possibly keeping
> open the file descriptors so it doesn't even suffer from closed connections.
Did you look at libckpt and similar libraries (ad: I once wrote
pego which produces /portable/ checkpoints, unlike libckpt, but I'm
not sure it's interesting in this case ;-))?
File descriptors and capabilities are the main issue since they are
bound to state that is /external/ to the application (be it in the
kernel or in a server). In Fluke, the authors argue that kernel
state should be exportable to allow for the implementation of user-level
checkpointing. However, in a multi-server system, application state is
spread across a bunch of servers which would all have to make their
state exportable. But from the server viewpoint, restoring complex
state from an untrusted source is not a reasonable thing.
Furthermore, a protected capability system does not allow the disclosure
of the "bit representation" of capabilities, so checkpointing
capabilities themselves is a meaningful way is not something
applications can do on their own.
In EROS, the whole system (kernel - drivers + all the processes) is
persistent, so there is, I think, no such problem: each checkpoint
contains everything that's needed to restore the whole thing. The issue
of restoring capabilities and their associated state arises when trying
to make only part of the processes persistent.
One solution would consist in logging all the interactions between the
persistent world and the non-persistent world in order to replay them
upon recovery, but that's quite ugly IMO. Instead, maybe special
support from the capability system could solve that.
Good luck, and best wishes! ;-)
Re: Is the list still working?, Jonathan S. Shapiro, 2006/01/11