monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] newbie question - SHA1 vs serials


From: Nathaniel Smith
Subject: Re: [Monotone-devel] newbie question - SHA1 vs serials
Date: Thu, 21 Apr 2005 03:36:16 -0700
User-agent: Mutt/1.5.9i

On Wed, Apr 20, 2005 at 12:30:11PM -0700, K. Richard Pixley wrote:
>    It's in the scope of CSM.  In the largest organizations, CSM and IT are
>    done by different people and it's not entirely clear to me where the
>    handoff is here.  That is, where do my responsibilities as a CSM
>    professional and a developer end and where do IT's begin.  Certainly, if I
>    want them to do backups, I need to be able to tell them when the info on
>    disk is stable and recoverable.
> 
>    Snapshotting file systems were expensive last time I looked and I don't
>    know any that run on linux native.

There are several options for snapshotting, built into modern linux
kernels, and supported across most (all?) filesystems -- google for
LVM, EVMS.

>    But the real problem isn't snapshotting the file system - it's making sure
>    that the database which is stored by a snapshot can be
>    restored/recovered/reused at a later time.  Between memory caches of file
>    systems, index caches, and the order of writes, we have vulnerabilities
>    here due both to the file system and also to the database system.

While I have not tested this, SQLite should handle this perfectly.
Monotone does not always write atomically to the working copy (doing
so is very hard), so an extremely unlucky power failure or snapshot
start could possibly leave you with a somewhat scrambled working copy;
but the database itself should always restart fine, roll back to a
consistent state, and continue onward.

>    The standard initial answer is simply to shut down the CSM and database,
>    make them unavailable for a while, block access to the file system, sync
>    it, then back it up.  For small groups, this is usually acceptable as
>    there's usually some hour of the day when everyone can be reliably
>    predicted to be asleep.  This doesn't scale well.  The "shut everything
>    down to single use mode" solution is easy.  But it means your respository
>    isn't available for some period of time.  For active repositories, this
>    isn't acceptable.
> 
>    The modern answer is to construct your database and file systems in such a
>    way that either data hits disk in an order which is always recoverable,
>    (in which case any snapshot is recoverable), or such that the system can
>    be forced to flush all caches and create a consistent disk state, even if
>    it's only briefly.  Clearcase uses the latter at this point.

As above, this is what SQLite is supposed to do.  You do have to have
some sort of snapshotting, because you need to read the entire file
and journal atomically, but as mentioned, that's not hard these days.

That's for traditional backup methods.  Monotone also has a
"post-modern" answer to backups: the basic communications operation in
monotone, is basically "make the other guy's database a complete
backup of mine".  Every developer, naturally, has a backup of history.
Communication is idempotent, so if any database is ever lost or
corrupted, one can simply recreate it empty, run a pull, and be good
to go.

For greater assurance, it is also easy to run multiple netsync
servers, all synchronized with each other at regular intervals by an
automated process.  Not only are these servers all backups of each
other (with a backup interval measured in minutes, rather than
"nightly" or the like), but they are _hot_ backups -- if a server goes
down, everyone can simply switch to another with no downtime.  They
can also be used for load balancing; it doesn't matter which server I
talk to, since any changes I push to it will end up in the other
servers as well within minutes.

If a server _does_ suddenly lose a disk, then even the few minutes of
commits that might not have been replicated yet are not lost; the next
time the developer who made the changes (or any developer who pulled
from the server immediately before it crashed) connects to a server,
their monotone will automatically notice the missing revisions, and
re-transmit them.

It is very difficult to lose data in Monotone; you have to try very,
very hard, and even then the smallest mistake will cause you to fail.

-- Nathaniel

-- 
"But suppose I am not willing to claim that.  For in fact pianos
are heavy, and very few persons can carry a piano all by themselves."




reply via email to

[Prev in Thread] Current Thread [Next in Thread]