monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] announcing preliminary "dumb server" support for mo


From: Zbynek Winkler
Subject: Re: [Monotone-devel] announcing preliminary "dumb server" support for monotone
Date: Thu, 13 Oct 2005 00:15:50 +0200
User-agent: Debian Thunderbird 1.0.2 (X11/20050602)

Nathaniel Smith wrote:

On Wed, Oct 12, 2005 at 09:34:26PM +0200, Zbynek Winkler wrote:
Hmm. I quite didn't get the picture until I tried it ;). I didn't have the patience to wait for the local do_export to finish on the monotone database... But the speed seems to be (unfortunately) comparable to the verification of the incomming changesets when doing regular pull. BTW:
No, different problem entirely -- do_export is currently quadratic in
the history length, mostly because it uses a separate invocation of
monotone to request each manifest delta, and since monotone still does
unbounded delta chaining, it takes linear time to retrieve an
arbitrary manifest.  (This also applies to files, but files tend to
have much shallower histories than the tree as a whole, so it doesn't
matter there as much.)
I am not that familiar with monotone codebase, so please forgive the question - but why does this cause do_export to be quadratic? Does it actually retrieve arbitrary manifests? Shouldn't the request for manifest delta be constant if that is the way it is stored in the db?

I've gone over merkle_dir.py and I believe it provides append-only data structure (file called DATA) for arbitrary chunks identified by id (mostly the hash of the chunk). I see some logic in do_export that checks if old_something is already in the merkle_dir and if not, the whole thing is put it - otherwise a delta of old_something and new_something is requested. Is this true? Doesn't this requested delta correspond directly to the delta stored in the monotone db? [BTW: For some reason I thought monotone does reverse delta...?] What I do not understand is how on earth can we have a 'thing' that we do not have the corresponding 'old_thing' for? And where does the linear search come from when getting the deltas?

One thing that would help a lot would be to move the packet commands
to automate (where they probably should be anyway), and then teaching
monotone.py to use 'automate stdio'.  That way we're using a
persistent monotone process, and the db layer's internal caching
should be able to turn this back into a linear operation.
Hmm. I am yet to find a way (or machine) how to compile monotone in a reasonable amount of time :(

What is the limiting factor of the verification step? Does it do some sorting?
No, it's doing a bunch of really torturous checks of different sorts
of data inconsistencies each revision might have.  "Torturous" because
our data structures were not well chosen (because when we were first
inventing this stuff, we didn't know as much as we do now).  The
rosters code replaces this stuff entirely, and shouldn't suffer from
the same problems.  (Instead, it will suffer from new, different
problems!  Hopefully less severe, though :-).)
Then I am looking forward to the rosters! :-)

OTOH, it supports monotone's full sync semantics (multiple people can
push to the same "repo", you can have backup "repos" and sync with
them indiscriminately, etc.), and should be reasonably efficient (it
uses merkle tries to do low-overhead set synchronization).  Won't be
as fast as netsync, or as flexible (whoever puts up the repo gets to
choose what branches are included, you don't get to pick on the fly
like for netsync), but might be handy for some people...
How do you pick what branches are included? Doesn't it always export the whole database? I was surprised to find out that it does not differentiate between files, changesets etc.
It always exports the whole database.  This could be made smarter, I
guess, but it certainly didn't seem worth the effort for the first
pass.  One of many possible improvements for someone to make, if they
want :-).
:-) Anyway, does it have to export the database at all? Maybe it could build the MerkleDir directly from the database...? But then maybe we would be reimplementing netsync in python... ;)

Zbynek

--
http://zw.matfyz.cz/     http://robotika.cz/
Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic





reply via email to

[Prev in Thread] Current Thread [Next in Thread]