monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] cvs import


From: Markus Schiltknecht
Subject: Re: [Monotone-devel] cvs import
Date: Thu, 14 Sep 2006 10:05:42 +0200
User-agent: Thunderbird 1.5.0.5 (X11/20060812)

Hi,

Nathaniel Smith wrote:
Regarding the basic dependency-based algorithm, the approach of
throwing everything into blobs and then trying to tease them apart
again seems backwards.  What I'm thinking is, first we go through and
build the history graph for each file.  Now, advance a frontier across
the all of these graphs simultaneously.  Your frontier is basically a
map <filename -> CVS revision>, that represents a tree snapshot.

Hm.. weren't you the one saying we should profit from the experience of cvs2svn? Another question I'm asking myself: if it would have been that easy to write a sane CVS importer, why didn't cvs2svn do something like that?

Anyway, I didn't want to go into discussing more algorithms here. And the discussion is already ways to noisy for my feeling. I want to write code, not emails :-)

Regarding storing things on disk vs. in memory: we always used to
stress-test monotone's cvs importer with the gcc history; just a few
weeks ago someone did a test import of NetBSD's src repo (~180k
commits) on a desktop with 2 gigs of RAM.  It takes a pretty big
history to really require disk (and for that matter, people with
histories that big likely have a big enough organization that they can
get access to some big iron to run the conversion on -- and probably
will want to anyway, to make it run in reasonable time).

Full ack.

Probably the biggest technical advantage of having the converter built
into monotone is that it makes it easy to import the file contents.
Since this data is huge (100x the repo size, maybe?), and the naive
algorithm for reconstructing takes time that is quadratic in the depth
of history, this is very valuable.  I'm not sure what sort of dump
format one could come up with that would avoid making this step very
expensive.

I can imagine a dump format that is only loosely coupled to the file data and deltas. But it seems like a lot of work to write a generic format which performs well for all VCSes.

I also suspect that SVN's dump format is suboptimal at the metadata
level -- we would essentially have to run a lot of branch/tag
inferencing logic _again_ to go from SVN-style "one giant tree with
branches described as copies, and multiple copies allowed for
branches/tags that are built up over time", to monotone-style
"DAG of tree snapshots".  This would be substantially less annoying
inferencing logic than that needed to decipher CVS in the first place,
granted, and it's stuff we want to write at some point anyway to allow
SVN importing, but it adds another step where information could be
lost.  I may be biased because I grok monotone better, but I suspect
it would be much easier to losslessly convert a monotone-style history
to an svn-style history than vice versa, possibly a generic dumping
tool would want to generate output that looks more like monotone's
model?

Yeah, and the GIT people want the generic dump look more like GIT. And then there are darcs, mercurial, etc...

Even if we _do_ end up writing two implementations of the algorithm,
we should share a test suite.

Sure, but as cvs2svn has another license, I can't just copy them over :-( I will write some tests, but if I write them in our monotone-lua testsuite, I'm sure nobody else is going to use them.

Regards

Markus





reply via email to

[Prev in Thread] Current Thread [Next in Thread]