monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: (long) some thoughts on monotone; still unable to p


From: graydon hoare
Subject: [Monotone-devel] Re: (long) some thoughts on monotone; still unable to post on monotone-devel.
Date: Tue, 03 Feb 2004 11:18:16 -0500
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4

Klaus Robert Suetterlin wrote:

Thanks for looking into the problem with my submits on monotone-devel.
I still seem to be unable to post there, so I will just send
everything to You.  If You think my stuff valuable for discussion,
You can crosspost to monotone-devel, if You like.  I do recieve
mail from monotone-devel ok.

ok, I am reposting to monotone-devel for posterity sake.

This is the first round of objections, I collected so far.  Please
understand that I formulate what I do not like or understand.  All
the rest is fine with me :), I guess.  I searched for a CVS replacement
quite some time and have high hopes on monotone.  My questions are
more related to the design not the implementation.

ok. you seem to have many questions along the same lines; I will try to answer clearly, but that makes this email very long (!)

My opinions on monotone are based on the public documentation and
monotone-devel threads.  I did not read a single line of code.

1) Is the info file correct at what it states?

I already guessed that it is not complete.  But are all the facts
stated in the info file correct (as defined by the design)?

I hope that it is correct, but I am human and make mistakes. can you point out a fact in specific, which you think is incorrect or misleading?

Monotone seems to destinguish between two kinds of stuff in the
repository.  Stuff that has a SHA1 and stuff that doesn't.

2) What do You call things that are referred to by SHA1 hash keys?

I call them either "files" or "manifests". the distinction is made only to make index lookups in the database go faster, and distinguish between them at a UI level (for instance when the SHA1-completion code runs).

soon I will transition to referring to keys and certs partly by SHA1 as well, for purposes of synchronization. but I will maintain the ability to refer to them by their "friendlier" names too (user email address for keys, id/name/value triple for certs)

I will call them VO (versioned object) from now on.

So far I see three basic concepts in monotone: content, context,
certificate.  The content can be referred to by SHA1.  The context
and certificate overlap a little.  But basically context is collected
in manifests and .mt-attrs files, which are versioned just like
content.  And the certs allow to express something beyond the context
of the source tree.  Like who is the author?  Why did she do some
things which ever way, and when?  Could The source tree be build
using the standard build procedure for the project?  Certificates
are also used to express heritage.

3) Why is there so few information (context) in the manifest?

there is enough information required to define the directory entries and file contents which make up the tree of files, and no more. I have a design preference for simple data formats.

In my oppinion this will lead to a lot of problems and workarounds.
For example I cannot see how the contents of the MT directory and
the context of the sources should be recreated by the content of
the current manifest.

the MT directory is a control directory explicitly *not* held in version control. it is not necessary or desirable for the manifest to describe the contents of the MT directory. versionned metadata goes either in certs (if it's about a particular version) or .mt-attrs (if it's about a particular pathname).

There are already .mt-attrs files to describe
more context and duplicate file pathnames.  How are You going to
record changes to the treestructure?

a change is recorded by an ancestry cert, which relates one tree state to another, and by saving deltas to the storage system between all changed files and manifests. if more aspects of the change need to be recorded (say, an edit+rename operation) they are recorded as rename certs.

Also inside the .mt-attrs
file?  For example how can I version things that are not regular
files?  What if I want to move a link, but keep its heritage, ...?

moving a file can often be recognized implicitly by noticing that one file with a given SHA1 disappeared in the source of an ancestry edge, and one file with the same SHA1 appeared in the destination of the ancestry edge. if the file changes concurrently with being moved, you need to explicitly tell monotone that it is the "same" file (issue a rename command). this generates a rename cert.

the fact that a file is a link is a platform-specific issue. you can encode that in .mt-attrs if you like.

I would like manifests to be more like normalised PRCS project
files.  These could replace MT directories and .mt-attrs file in
external representation, can help to simplify the userinterface.
And they would allow to store all context in a nicely versioned
way.

I do not want to change the format of manifest files. they are simple and suit my preferences. if you would like to write a patch, or provide more detail about why a change to the manifest format would make things simpler or improve the user interface, I am happy to continue discussing it.

4) As far as I can tell monotone allows for certs to refer to all
kind of VOs.  I cannot see what the point of refering a cert to a
specific file version SHA1(file) is.

certain facts are intrinsic to a file. suppose a version is known to not compile, or to contain a security hole. suppose I have made some code review comments on a specific file version. my initial (idealistic) thinking on the matter was that those certs have meaning which is mostly independent of the version of the tree a file shows up in.

you are right that in some cases it makes more sense to attach a "file" cert to a particular file in a particular tree. we don't have a cert type which does that now. perhaps we could add one.

speaking practically rather than idealistically: so far I have found no *practical* use for file certs at all. I am considering dropping them altogether.

Certs should only have meaning
when they refer to context or context+content.  I will give some
examples below.  Please forgive my nonstandard certificate
representation, I think it is quite nice.

For example if I want to state that some file version is the largest,
i.e. has the most bytes of all file versions in the tree (repository?).

wrong:
    [cert]
    SHA1(file):boast:"This is the largest file":signed-by-me
    [end]

current:
    [cert]
    SHA1(manifest):boast:"SHA1(file) is the largest file":signed-by-me
    [end]

my proposal (either like above or):
    [cert]
    SHA1(manifest).SHA1(file):boast:"This is the largest file":signed-by-me
    [end]


If two files have the same SHA1, how can I attach a cert only to
one of them?  For example to better describe heritage.  As SHA1(file)
only describes the content, there are several ways (heritage) to
get to a specific SHA1(file) and several ways to fork from there,
which should be distinguished in merging.  Examples are the empty
file, the temporarily removed file, or a file importet from a
different branch.

Just take the case where two different files reach the same content
in completely different contexts:
Content=Content1=Content2="YES"
Context1="Are You small?"
Context2="Are You shy?".
If heritage is tied to content only this will result in a mess.  If
heritage is related to context or context+content, everything should
work out nicely.

[cert]
SHA1(Context1).SHA1(Content):parent:SHA1(Context1.old).SHA1(Content1.old):sig
[end]

hm. well, this is conceivable, if you want your cert to refer to not just a particular file in a particular tree, but a particular file *edge* on a particular tree *edge* in the ancestry graph.

but this is all very theoretical. so far the only practical use I have made of edge-like certs is to describe ancestry, between manifests, and the only practical use I have made of non-edge-like certs is to describe the usual changelog stuff: log message, author, date-and-time, branch name, tag.

Please forgive me but I strongly feel ``ancestor'' is the wrong name for the 
heritage cert.  It should be ``parent'' really.

hmm. consider this graph:

A -> B -> C

I take "parent" to mean "immediate parent" ([B,A] or [C,B]), which are the sort of ancestor certs issued when you commit a version. but monotone accepts the situation where someone puts in a cert [C,A] in as well, expressing the abbreviated ancestry relationship between C and A. this has to work because there is no "global" view of ancestry; I can learn of intermediate versions on an existing edge, or new forks and merges, any time I communicate with a 3rd party.

6) BTW I do not see the need for certificates.  Why are they needed?  Are they 
just a simple to extend way of adding state(ments) to a version without 
changing the version number (SHA1(context)).

yes. they are an "open" metadata vocabulary. more importantly, they are a vocabulary which you can express partial trust relationships about. I can say that I will listen to certs you write in one context, but I need further affirmation by someone else whom I trust more in another, more sensitive context.

7) I do not see the fundamental meaning of branches in monotone
(i.e. the branch cert).  Are they just a convenience (to support
some special kind of merging) or a part of the design?  I will call
the monotone branch ``btag'' to avoid collisions with other concepts
of branching in version control.

a branch is a merge obligation. if two heads have the same branch cert, monotone will try to merge them. if they do not, it will not.

branches have nothing whatsoever to do with ancestry. I can take one copy of the linux source tree, commit it. take one copy of the freebsd source tree, commit it. mark both as belonging to the SuperUnix branch. monotone will then try (not very successfully) to merge them.

As far as I can see, a fork in monotone is what other versioning
systems call a branch.

most other versioning systems have no concept of a fork which is not a branch. but monotone's notion of a fork is weaker than a branch in other VC systems. notably: forks do not have names.

Btags states the intention to merge
the tagged version primarily with other versions having the same
btag.

yes.

Unlike branches, btags are not ``_forks_ that should _not_ be
merged'' but on the contrary ``independant _versions_ whos current
heads should primarily merge with each other''.  Versions belonging
to the same btag need not fork from some common ancestor.  Btags
are just free form certs.  They do not certify anything.  Not even
that merging tagged heads is sensible.

correct.

I see more problems with merging in large projects (i.e. those with
lots of merge conficts).  How can developers even hope to synchronise
their versions?

8) I understand that --- under the reasonable assumption of unique
SHA1 --- two developers notice when they have the same version.
But unfortunately merging conflicts can be quite difficult to resolve
twice in exactly the same way.

Two developers cannot merge packages independently.  As monotone
repositories remembers method and result of file merges(see 9 below,
too).  Exchanging all the change sets is not the problem.  But all
merging would have to be done by one person and the result redistributed
to all others for integration.  This prevents the original author
of a change from doing the merging.

that is not completely true. first, realize that many changes *can* merge automatically, if the merge algorithm is sufficiently clever. then notice what this means: when I merge 2 trees, suppose (pessimistically) 80% of the changes merge cleanly, with 20% conflicting. suppose you also merge the same tree, with 20% conflicts. we will each resolve some conflicts the same way, some differently. suppose half the ways we choose to resolve the conflicts are the same; we have now produced 2 new trees -- just as many as we had going into the merge -- but with 90% less difference.

it is not necessary -- nor possible with monotone's design -- that the branches seen by all users on all hosts ever assumes a *unique* head. merely that each communicate-merge-update cycle brings the multiple heads which *do* exist closer together. it is like CVS in the sense that at any time there are multiple slightly-different lines of development happening in parallel; the difference is that in CVS merging between lines happens only before a commit, and in monotone it can happen before *or after* a commit. there is no "race to commit" like with CVS.

``The other''TM distributed versioning tool, solves this by cloning
and keeping track of unique repository ids, and doing merges in
staging areas.  First push wins.  Is there an easy solution in
monotone that I just do not notice?

you can construct both the CVS model and the bitkeeper model on top of monotone.

if you want to construct the CVS model, you have a centralized "signing robot" which is a tie-breaker. it signs everything it sees with a timestamp, and everyone agrees to add a merge hook which favours earlier robot-signed versions over later ones, in any merge. this makes your non-mergable changes bounce off "earler" signed ones, so you will always update before merging, to try to make sure there are no "earlier" versions you might bounce off.

if you want to construct the bitkeeper model (which is quite nice), you just make sub-branches and propagate between them. so for example we have an i18n branch, net.venge.monotone.i18n. if we had a translation team, I would ask them to commit to that branch primarily. each now and then I will do a propagate from net.venge.monotone.i18n to net.venge.monotone, and I will resolve the conflicts; i18n was essentially a staging area. each now and then (when they're interested in trying out new features) the i18n developers will do a propagate from net.venge.monotone to net.venge.monotone.i18n.

if they want to make my life even easier, the i18n guys can build a tie-breaker robot or build a sub-staging area, so that I always have something unique to propagate *from* when I'm absorbing a change from them. or I can say that I trust one of their team-members to construct merges and propagate *to* the net.venge.monotone branch, and delegate responsibility for it completely. there's lots of variations which make sense.

9) When two file versions (SHA1(content)) are merged, monoton uses the
result of that merge for all future merges of these two file versions,
even though they might happen in completely different context.  Why?

it seemed like a cheap heuristic to throw in, since it's very unlikely that a particular merge problem would happen in a different context. but that's a bit of a silent error waiting to happen, I guess. maybe it's no good, and should be removed. do you think so?

10) The (usefulness of the) ability to sort / select / prorize packages
by certs when updating is unclear to me.  What exactly happens
there?  What is updating in terms of merging?  The working copy is
a valid version, too.  So what is so special about updating?  Is
this due to the notion of monotone branches?

this section is being clarified in the current round of development, as our thoughts on the matter have become more clear. I hope the next version will have a clearer organization to it, which is based on a simple accept/reject distinction, rather than arbitrary sorting.

the new rules follow:

- before an ancestry cert is acted upon -- during merge, update, or otherwise -- it is passed to a trust-evaluation hook. this hook is given *all* the keys which have produced valid signatures on the particular manifest ancestor->child assertion of the cert. the hook returns whether or not to trust the ancestry relationship implied by the cert. the "monotone approve <id1> <id2>" command adds ancestry certs to the database. the "monotone disapprove <id1> <id2>" command adds an ancestry-like "disapproval" cert, any of which are also passed to the trust evaluation hook. the purpose here is to permit per-edge code review.

- similarly, before an update or checkout happens -- though not a merge -- monotone checks for "testresult" certs on the predecessor of the update and the selected update target. these two sets of testresults are passed to a different trust evaluation hook, which decides if the proposed update/checkout is sufficiently safe. the purpose here is to permit "checking out / updating a working copy", as defined by some autotester. obviously if you want the "most recent possible" version (tested or otherwise) you can just leave the hook undefined.

I'm running out of steam.

as am I. time to go to work. I hope I've clarified some things. try to keep in mind that in absence of "simple" answers, I base design decisions on practical ergonomic and social needs -- observed in the line of real work -- rather than ideals about the perfect VC system. if you can show important operational failures, or ways in which it makes a real person feel uncomfortable, I am always interested in seeing them.

please ask further if you have further questions.

-graydon




reply via email to

[Prev in Thread] Current Thread [Next in Thread]