monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] user expirience (speed issue)


From: Nathaniel Smith
Subject: Re: [Monotone-devel] user expirience (speed issue)
Date: Tue, 6 Dec 2005 13:45:44 -0800
User-agent: Mutt/1.5.9i

On Tue, Dec 06, 2005 at 11:58:49AM +0100, Markus Schiltknecht wrote:
> Hello monotone hackers,
> 
> I'm trying to switch to monotone for my projects, since I really love
> it's concepts. I'm hacking on PostgreSQL, so that whole project was
> imported from the cvs repository.

Glad to hear you like it!

> On my laptop, the cvs_import took ways to long, I've aborted it. Then
> rerun the same on the server. I don't remember exactly, but monotone
> cvs_import run for about 4-6 days.

Yeah; this is a, err... well known problem :-).  Basically, monotone
does some checking to make sure that the stuff its storing is
basically coherent -- this protects against either bugs and malicious
users putting broken or nonsensical stuff into history, that you then
base work on, and can't get rid of... basically, monotone always
guarantees that its data is valid, one way or another -- it always
checks for disk corruption before outputting stored data, it never
stores anything that doesn't make sense, etc.

However, this checking is, ATM, really really slow :-(.  This is
definitely the single biggest problem people run into with monotone.
Unfortunately, the problem has turned out to be non-trivial to solve.
We don't want to just disable the checking; we want to make it fast
enough that everyone can speed _and_ safety.  But that's turned out
to require some major rethinking of what sort of data structures and
algorithms we use.

> Of course I didn't want to code on the server, so I needed to sync with
> the laptop. No problem, I thought, and gave monotone pull a shot. But
> that took several hours again... I got unpatient and tried another
> thing: I simply downloaded the whole database from the server and use it
> on the laptop. That worked just fine after importing private keys by
> hand. Even updating on server or laptop and then syncing was no problem.

Right; the checking affects both cvs_import and pull equally, because
they both involve putting a whole lot of revisions into a db that
didn't have them before.  Downloading the initial db by hand is the
workaround that the larger monotone-using projects have been using
for now.

> Now, if you want to try, you might pull my 'org.postgresql' branch from
> my server at 213.133.111.57. But this takes terribly long (more than a
> few hours). Other branches like org.postgresql.REL8_0_STABLE or so take
> just some minutes, that's fine.

Right.

(Actually, this sort of points to a bug in cvs_import; normally pulling
a branch would involve pulling all of its history too, so pulling
REL8_0_STABLE would require pulling most of mainline.  But cvs_import
currently doesn't knit together branches; it treats each branch as a
new from-scratch import.  The problem is that in general, CVS repos
can be incoherent, nonsensical tangles of revisions, so branch
reconstruction is in principle impossible; but we do plan to add some
sort of best-effort attempt at it, that will work okay for reasonable
CVS repos.)

> A subsequent pull takes less than five minutes and gives:
> monotone: connecting to 213.133.111.57
> monotone: finding items to synchronize:
> monotone:   certs |    keys | revisions
> monotone:  71,061 |       1 |    23,663
> monotone: bytes in | bytes out | certs in | revs in | revs written
> monotone:      196 |     1,057 |        0 |       0 |            0
> monotone: successful exchange with 213.133.111.57

5 minutes?  Ouch, I assume that's all in the "finding items to
synchronize" part (the first row of tickers)?  There's no real reason
why doing a pull with everything already up to date should take any
time at all.  Guess that's something else to optimize, once we've sped
removed the current round of bottlenecks :-).

> Why is a monotone pull taking so much longer than a plain download?
> What's the difference to downloading the database (other than being able
> to pull only specific branch(es)?) What can be done to make a pull
> faster?

Think I basically answered these up above.  Some possible solutions:
 -- There's a development branch called "net.venge.monotone.cvssync"
    that essentially adds cvs client capability to monotone; it's
    designed for the use case where an upstream project is still in
    CVS, but an individual developer (like you) wants to use monotone
    for their own work.  It can do a _partial_ import from a remote
    server (so, like, just the last few revisions), and then pull new
    changes in and push old changes out directly.

    If you just want to use monotone to get some work done, without
    converting the whole project over, it might be useful.  (Do beware
    that it's still somewhat experimental, though; some people use it
    all the time with no problems, but keep in mind that it's talking
    directly to your real CVS server, you don't get monotone's normal
    guarantee that if something goes really wrong due to bug or user
    error, you can just throw out your local db :-).)
 -- Just use the initial-pull-over-http, later-pulls-over-netsync
    workaround.  Of course, this is unsatisfactory in the long run,
    so...
 -- Monotone's main development effort is currently focused on the
    "rosters" branch; you might have heard mentions of this if you've
    been looking at the list or IRC :-).  This is, we hope, the
    long-term solution to the speed problems you've seen.  It's been
    quite some time in the making, because it involved figuring out a
    whole new strategy for handling revisions, ties into a fancy new
    merge algorithm, etc., but it will be merged to mainline RSN and
    preliminary measurements show it ~2x faster at doing initial
    pulls.  (On trees the size of monotone, anyway; that factor will
    vary depending on all sorts of things.  I'd _hope_ it does even
    better on larger trees, but I don't know.)  More importantly,
    it'll make future optimizations a lot easier.  You can read more
    about them:
      http://article.gmane.org/gmane.comp.version-control.monotone.devel/5359
    or watch our TODO-before-merge list shrink:
      http://venge.net/monotone/wiki/RostersTodo

Hope that helps!  Let us know if you have any other questions...

-- Nathaniel

-- 
"The problem...is that sets have a very limited range of
activities -- they can't carry pianos, for example, nor drink
beer."




reply via email to

[Prev in Thread] Current Thread [Next in Thread]