[Monotone-devel] Re: (long) some thoughts on monotone; still unable to p

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: (long) some thoughts on monotone; still unable to p

From:	graydon hoare
Subject:	[Monotone-devel] Re: (long) some thoughts on monotone; still unable to post on monotone-devel.
Date:	Tue, 03 Feb 2004 11:18:16 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4

Klaus Robert Suetterlin wrote:

Thanks for looking into the problem with my submits on monotone-devel.
I still seem to be unable to post there, so I will just send
everything to You.  If You think my stuff valuable for discussion,
You can crosspost to monotone-devel, if You like.  I do recieve
mail from monotone-devel ok.


ok, I am reposting to monotone-devel for posterity sake.

This is the first round of objections, I collected so far.  Please
understand that I formulate what I do not like or understand.  All
the rest is fine with me :), I guess.  I searched for a CVS replacement
quite some time and have high hopes on monotone.  My questions are
more related to the design not the implementation.

ok. you seem to have many questions along the same lines; I will try toanswer clearly, but that makes this email very long (!)

My opinions on monotone are based on the public documentation and
monotone-devel threads.  I did not read a single line of code.

1) Is the info file correct at what it states?

I already guessed that it is not complete.  But are all the facts
stated in the info file correct (as defined by the design)?

I hope that it is correct, but I am human and make mistakes. can youpoint out a fact in specific, which you think is incorrect or misleading?

Monotone seems to destinguish between two kinds of stuff in the
repository.  Stuff that has a SHA1 and stuff that doesn't.

2) What do You call things that are referred to by SHA1 hash keys?

I call them either "files" or "manifests". the distinction is made onlyto make index lookups in the database go faster, and distinguish betweenthem at a UI level (for instance when the SHA1-completion code runs).

soon I will transition to referring to keys and certs partly by SHA1 aswell, for purposes of synchronization. but I will maintain the abilityto refer to them by their "friendlier" names too (user email address forkeys, id/name/value triple for certs)

I will call them VO (versioned object) from now on.

So far I see three basic concepts in monotone: content, context,
certificate.  The content can be referred to by SHA1.  The context
and certificate overlap a little.  But basically context is collected
in manifests and .mt-attrs files, which are versioned just like
content.  And the certs allow to express something beyond the context
of the source tree.  Like who is the author?  Why did she do some
things which ever way, and when?  Could The source tree be build
using the standard build procedure for the project?  Certificates
are also used to express heritage.

3) Why is there so few information (context) in the manifest?

there is enough information required to define the directory entries andfile contents which make up the tree of files, and no more. I have adesign preference for simple data formats.

In my oppinion this will lead to a lot of problems and workarounds.
For example I cannot see how the contents of the MT directory and
the context of the sources should be recreated by the content of
the current manifest.

the MT directory is a control directory explicitly *not* held in versioncontrol. it is not necessary or desirable for the manifest to describethe contents of the MT directory. versionned metadata goes either incerts (if it's about a particular version) or .mt-attrs (if it's about aparticular pathname).

There are already .mt-attrs files to describe
more context and duplicate file pathnames.  How are You going to
record changes to the treestructure?

a change is recorded by an ancestry cert, which relates one tree stateto another, and by saving deltas to the storage system between allchanged files and manifests. if more aspects of the change need to berecorded (say, an edit+rename operation) they are recorded as rename certs.

Also inside the .mt-attrs
file?  For example how can I version things that are not regular
files?  What if I want to move a link, but keep its heritage, ...?

moving a file can often be recognized implicitly by noticing that onefile with a given SHA1 disappeared in the source of an ancestry edge,and one file with the same SHA1 appeared in the destination of theancestry edge. if the file changes concurrently with being moved, youneed to explicitly tell monotone that it is the "same" file (issue arename command). this generates a rename cert.

the fact that a file is a link is a platform-specific issue. you canencode that in .mt-attrs if you like.

I would like manifests to be more like normalised PRCS project
files.  These could replace MT directories and .mt-attrs file in
external representation, can help to simplify the userinterface.
And they would allow to store all context in a nicely versioned
way.

I do not want to change the format of manifest files. they are simpleand suit my preferences. if you would like to write a patch, or providemore detail about why a change to the manifest format would make thingssimpler or improve the user interface, I am happy to continue discussing it.

4) As far as I can tell monotone allows for certs to refer to all
kind of VOs.  I cannot see what the point of refering a cert to a
specific file version SHA1(file) is.

certain facts are intrinsic to a file. suppose a version is known to notcompile, or to contain a security hole. suppose I have made some codereview comments on a specific file version. my initial (idealistic)thinking on the matter was that those certs have meaning which is mostlyindependent of the version of the tree a file shows up in.

you are right that in some cases it makes more sense to attach a "file"cert to a particular file in a particular tree. we don't have a certtype which does that now. perhaps we could add one.

speaking practically rather than idealistically: so far I have found no*practical* use for file certs at all. I am considering dropping themaltogether.

Certs should only have meaning
when they refer to context or context+content.  I will give some
examples below.  Please forgive my nonstandard certificate
representation, I think it is quite nice.

For example if I want to state that some file version is the largest,
i.e. has the most bytes of all file versions in the tree (repository?).

wrong:
    [cert]
    SHA1(file):boast:"This is the largest file":signed-by-me
    [end]

current:
    [cert]
    SHA1(manifest):boast:"SHA1(file) is the largest file":signed-by-me
    [end]

my proposal (either like above or):
    [cert]
    SHA1(manifest).SHA1(file):boast:"This is the largest file":signed-by-me
    [end]


If two files have the same SHA1, how can I attach a cert only to
one of them?  For example to better describe heritage.  As SHA1(file)
only describes the content, there are several ways (heritage) to
get to a specific SHA1(file) and several ways to fork from there,
which should be distinguished in merging.  Examples are the empty
file, the temporarily removed file, or a file importet from a
different branch.

Just take the case where two different files reach the same content
in completely different contexts:
Content=Content1=Content2="YES"
Context1="Are You small?"
Context2="Are You shy?".
If heritage is tied to content only this will result in a mess.  If
heritage is related to context or context+content, everything should
work out nicely.

[cert]
SHA1(Context1).SHA1(Content):parent:SHA1(Context1.old).SHA1(Content1.old):sig
[end]

hm. well, this is conceivable, if you want your cert to refer to notjust a particular file in a particular tree, but a particular file*edge* on a particular tree *edge* in the ancestry graph.

but this is all very theoretical. so far the only practical use I havemade of edge-like certs is to describe ancestry, between manifests, andthe only practical use I have made of non-edge-like certs is to describethe usual changelog stuff: log message, author, date-and-time, branchname, tag.

Please forgive me but I strongly feel ``ancestor'' is the wrong name for the 
heritage cert.  It should be ``parent'' really.


hmm. consider this graph:

A -> B -> C

I take "parent" to mean "immediate parent" ([B,A] or [C,B]), which arethe sort of ancestor certs issued when you commit a version. butmonotone accepts the situation where someone puts in a cert [C,A] in aswell, expressing the abbreviated ancestry relationship between C and A.this has to work because there is no "global" view of ancestry; I canlearn of intermediate versions on an existing edge, or new forks andmerges, any time I communicate with a 3rd party.

6) BTW I do not see the need for certificates.  Why are they needed?  Are they 
just a simple to extend way of adding state(ments) to a version without 
changing the version number (SHA1(context)).

yes. they are an "open" metadata vocabulary. more importantly, they area vocabulary which you can express partial trust relationships about. Ican say that I will listen to certs you write in one context, but I needfurther affirmation by someone else whom I trust more in another, moresensitive context.

7) I do not see the fundamental meaning of branches in monotone
(i.e. the branch cert).  Are they just a convenience (to support
some special kind of merging) or a part of the design?  I will call
the monotone branch ``btag'' to avoid collisions with other concepts
of branching in version control.

a branch is a merge obligation. if two heads have the same branch cert,monotone will try to merge them. if they do not, it will not.

branches have nothing whatsoever to do with ancestry. I can take onecopy of the linux source tree, commit it. take one copy of the freebsdsource tree, commit it. mark both as belonging to the SuperUnix branch.monotone will then try (not very successfully) to merge them.

As far as I can see, a fork in monotone is what other versioning
systems call a branch.

most other versioning systems have no concept of a fork which is not abranch. but monotone's notion of a fork is weaker than a branch in otherVC systems. notably: forks do not have names.

Btags states the intention to merge
the tagged version primarily with other versions having the same
btag.


yes.

Unlike branches, btags are not ``_forks_ that should _not_ be
merged'' but on the contrary ``independant _versions_ whos current
heads should primarily merge with each other''.  Versions belonging
to the same btag need not fork from some common ancestor.  Btags
are just free form certs.  They do not certify anything.  Not even
that merging tagged heads is sensible.


correct.

I see more problems with merging in large projects (i.e. those with
lots of merge conficts).  How can developers even hope to synchronise
their versions?

8) I understand that --- under the reasonable assumption of unique
SHA1 --- two developers notice when they have the same version.
But unfortunately merging conflicts can be quite difficult to resolve
twice in exactly the same way.

Two developers cannot merge packages independently.  As monotone
repositories remembers method and result of file merges(see 9 below,
too).  Exchanging all the change sets is not the problem.  But all
merging would have to be done by one person and the result redistributed
to all others for integration.  This prevents the original author
of a change from doing the merging.

that is not completely true. first, realize that many changes *can*merge automatically, if the merge algorithm is sufficiently clever. thennotice what this means: when I merge 2 trees, suppose (pessimistically)80% of the changes merge cleanly, with 20% conflicting. suppose you alsomerge the same tree, with 20% conflicts. we will each resolve someconflicts the same way, some differently. suppose half the ways wechoose to resolve the conflicts are the same; we have now produced 2 newtrees -- just as many as we had going into the merge -- but with 90%less difference.

it is not necessary -- nor possible with monotone's design -- that thebranches seen by all users on all hosts ever assumes a *unique* head.merely that each communicate-merge-update cycle brings the multipleheads which *do* exist closer together. it is like CVS in the sense thatat any time there are multiple slightly-different lines of developmenthappening in parallel; the difference is that in CVS merging betweenlines happens only before a commit, and in monotone it can happen before*or after* a commit. there is no "race to commit" like with CVS.

``The other''TM distributed versioning tool, solves this by cloning
and keeping track of unique repository ids, and doing merges in
staging areas.  First push wins.  Is there an easy solution in
monotone that I just do not notice?

you can construct both the CVS model and the bitkeeper model on top ofmonotone.

if you want to construct the CVS model, you have a centralized "signingrobot" which is a tie-breaker. it signs everything it sees with atimestamp, and everyone agrees to add a merge hook which favours earlierrobot-signed versions over later ones, in any merge. this makes yournon-mergable changes bounce off "earler" signed ones, so you will alwaysupdate before merging, to try to make sure there are no "earlier"versions you might bounce off.

if you want to construct the bitkeeper model (which is quite nice), youjust make sub-branches and propagate between them. so for example wehave an i18n branch, net.venge.monotone.i18n. if we had a translationteam, I would ask them to commit to that branch primarily. each now andthen I will do a propagate from net.venge.monotone.i18n tonet.venge.monotone, and I will resolve the conflicts; i18n wasessentially a staging area. each now and then (when they're interestedin trying out new features) the i18n developers will do a propagate fromnet.venge.monotone to net.venge.monotone.i18n.

if they want to make my life even easier, the i18n guys can build atie-breaker robot or build a sub-staging area, so that I always havesomething unique to propagate *from* when I'm absorbing a change fromthem. or I can say that I trust one of their team-members to constructmerges and propagate *to* the net.venge.monotone branch, and delegateresponsibility for it completely. there's lots of variations which makesense.

9) When two file versions (SHA1(content)) are merged, monoton uses the
result of that merge for all future merges of these two file versions,
even though they might happen in completely different context.  Why?

it seemed like a cheap heuristic to throw in, since it's very unlikelythat a particular merge problem would happen in a different context. butthat's a bit of a silent error waiting to happen, I guess. maybe it's nogood, and should be removed. do you think so?

10) The (usefulness of the) ability to sort / select / prorize packages
by certs when updating is unclear to me.  What exactly happens
there?  What is updating in terms of merging?  The working copy is
a valid version, too.  So what is so special about updating?  Is
this due to the notion of monotone branches?

this section is being clarified in the current round of development, asour thoughts on the matter have become more clear. I hope the nextversion will have a clearer organization to it, which is based on asimple accept/reject distinction, rather than arbitrary sorting.


the new rules follow:

- before an ancestry cert is acted upon -- during merge, update, orotherwise -- it is passed to a trust-evaluation hook. this hook is given*all* the keys which have produced valid signatures on the particularmanifest ancestor->child assertion of the cert. the hook returns whetheror not to trust the ancestry relationship implied by the cert. the"monotone approve <id1> <id2>" command adds ancestry certs to thedatabase. the "monotone disapprove <id1> <id2>" command adds anancestry-like "disapproval" cert, any of which are also passed to thetrust evaluation hook. the purpose here is to permit per-edge code review.

- similarly, before an update or checkout happens -- though not amerge -- monotone checks for "testresult" certs on the predecessor ofthe update and the selected update target. these two sets of testresultsare passed to a different trust evaluation hook, which decides if theproposed update/checkout is sufficiently safe. the purpose here is topermit "checking out / updating a working copy", as defined by someautotester. obviously if you want the "most recent possible" version(tested or otherwise) you can just leave the hook undefined.

I'm running out of steam.

as am I. time to go to work. I hope I've clarified some things. try tokeep in mind that in absence of "simple" answers, I base designdecisions on practical ergonomic and social needs -- observed in theline of real work -- rather than ideals about the perfect VC system. ifyou can show important operational failures, or ways in which it makes areal person feel uncomfortable, I am always interested in seeing them.


please ask further if you have further questions.

-graydon

[Prev in Thread]

Current Thread

[Next in Thread]

[Monotone-devel] Re: (long) some thoughts on monotone; still unable to post on monotone-devel., graydon hoare <=

Prev by Date: [Monotone-devel] Some thoughts on ids
Next by Date: [Monotone-devel] Re: Some thoughts on ids
Previous by thread: [Monotone-devel] Some thoughts on ids
Next by thread: [Monotone-devel] RE: Some thoughts on ids
Index(es):
- Date
- Thread