monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] RE: Support for binary files, scalability and Windows p


From: Asger Kunuk Ottar Alstrup
Subject: [Monotone-devel] RE: Support for binary files, scalability and Windows port
Date: Mon, 19 Jan 2004 14:34:23 +0100

I understand that you work on this only in your free time, and that you
might disagree with me, so take this as discussion in the way it is
intended: I appreciate what monotone can do, and I think you have made a
really nice system. It is the best free distributed version control I
have found, and I have looked at all of those I could find.

I'm just tossing up some ideas in the hope that they might be useful to
extend the scope of monotone to support really big files efficiently. My
background is once again that I have a need to version control both
source code, and big binary files in a distributed environment, and I'm
investigating the options for how to get that. The best candidate is
monotone, and based on what the detailed analysis shows, I basically
have a number of options on a scale covering:

1) Live with CVS, and wait until something better comes up.
2) Work with you to improve monotone, and use it for source code only,
as it is primarily intended.
3) Work with you, try to convince you to accept some patches to support
big files eventhough they are intrusive.
  b) If you need to change the main data structure, try to advocate
incorporating support for big files as part of that.
4) Fork monotone to a new system which supports big files the way I feel
it should be done.

Right now, I'm working on 3b, but I'm not expecting you to use a lot of
your free time to allow us to meet the wishes we have. And I'm not
proposing to fork monotone at all: I think we have a common interest in
unifying the efforts, and I'm glad that you are willing to consider
patches that would extend the scope to cover big files.

So, in summary: I appreciate the time you are spending on monotone, and
I appreciate the friendly and open tone with which you approach this
discussion.

With that out of the way, back to the concrete discussion:

> what's good about identifying things by content hash is that you and I
> will always construct the *same* content hash for a given object, even
> when we are not explicitly communicating.

I understand that hashes will help in the situation where you and I
independently make the same change - in this scenario, we do not need to
merge, because the files are identical. So, this property is what I
propose to give up: In this situation, the users have to merge the two
files explicitly, and say that they are in fact the same version.

This is because I do not think this use case has priority over the use
case where I revert a change, but another party does not.

So, my suggestion is to separate concerns: Identify each version of a
file with a truly unique identifier in the version DAG, and then have a
separate scheme for representing each version of a file in a compact and
efficient way. The representation part can have different strategies:
The default one is based on the SHA1 256-way tree and blocks as you
propose, but another could be based on another scheme that does not rely
on hashes. Each strategy will have different strengths and weaknesses,
and thus serve different purposes.

To represent the version DAG and exchange this, you need to exchange the
nodes and edges of the graph to make sure that each party has the same
nodes and edges. Each node refers to a version of a file, which can be
represented by a SHA1 tree or another scehe, Notice that with this
structure, you have the very nice property that you also only *add*
edges and nodes to the version graph: You never have to change or erase
things from the graph.

That should make synchronising that data structure pretty simple. The
question of synchronising the representation of a file has to be worked
out separately, but the work is proportional to how many representation
schemes you are aiming for.

In other words, I feel that it is beneficial to split the task of
representing the dynamic development of versions across multiple sites
from the task of representing the contents of the versions themselves.
Such a split would also open up for versioning other things than files,
including the meta-data itself.

Best regards,
Asger





reply via email to

[Prev in Thread] Current Thread [Next in Thread]