[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] Re: (long) some thoughts on monotone; still unable to p
From: |
graydon hoare |
Subject: |
[Monotone-devel] Re: (long) some thoughts on monotone; still unable to post on monotone-devel. |
Date: |
Tue, 03 Feb 2004 11:18:16 -0500 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4 |
Klaus Robert Suetterlin wrote:
Thanks for looking into the problem with my submits on monotone-devel.
I still seem to be unable to post there, so I will just send
everything to You. If You think my stuff valuable for discussion,
You can crosspost to monotone-devel, if You like. I do recieve
mail from monotone-devel ok.
ok, I am reposting to monotone-devel for posterity sake.
This is the first round of objections, I collected so far. Please
understand that I formulate what I do not like or understand. All
the rest is fine with me :), I guess. I searched for a CVS replacement
quite some time and have high hopes on monotone. My questions are
more related to the design not the implementation.
ok. you seem to have many questions along the same lines; I will try to
answer clearly, but that makes this email very long (!)
My opinions on monotone are based on the public documentation and
monotone-devel threads. I did not read a single line of code.
1) Is the info file correct at what it states?
I already guessed that it is not complete. But are all the facts
stated in the info file correct (as defined by the design)?
I hope that it is correct, but I am human and make mistakes. can you
point out a fact in specific, which you think is incorrect or misleading?
Monotone seems to destinguish between two kinds of stuff in the
repository. Stuff that has a SHA1 and stuff that doesn't.
2) What do You call things that are referred to by SHA1 hash keys?
I call them either "files" or "manifests". the distinction is made only
to make index lookups in the database go faster, and distinguish between
them at a UI level (for instance when the SHA1-completion code runs).
soon I will transition to referring to keys and certs partly by SHA1 as
well, for purposes of synchronization. but I will maintain the ability
to refer to them by their "friendlier" names too (user email address for
keys, id/name/value triple for certs)
I will call them VO (versioned object) from now on.
So far I see three basic concepts in monotone: content, context,
certificate. The content can be referred to by SHA1. The context
and certificate overlap a little. But basically context is collected
in manifests and .mt-attrs files, which are versioned just like
content. And the certs allow to express something beyond the context
of the source tree. Like who is the author? Why did she do some
things which ever way, and when? Could The source tree be build
using the standard build procedure for the project? Certificates
are also used to express heritage.
3) Why is there so few information (context) in the manifest?
there is enough information required to define the directory entries and
file contents which make up the tree of files, and no more. I have a
design preference for simple data formats.
In my oppinion this will lead to a lot of problems and workarounds.
For example I cannot see how the contents of the MT directory and
the context of the sources should be recreated by the content of
the current manifest.
the MT directory is a control directory explicitly *not* held in version
control. it is not necessary or desirable for the manifest to describe
the contents of the MT directory. versionned metadata goes either in
certs (if it's about a particular version) or .mt-attrs (if it's about a
particular pathname).
There are already .mt-attrs files to describe
more context and duplicate file pathnames. How are You going to
record changes to the treestructure?
a change is recorded by an ancestry cert, which relates one tree state
to another, and by saving deltas to the storage system between all
changed files and manifests. if more aspects of the change need to be
recorded (say, an edit+rename operation) they are recorded as rename certs.
Also inside the .mt-attrs
file? For example how can I version things that are not regular
files? What if I want to move a link, but keep its heritage, ...?
moving a file can often be recognized implicitly by noticing that one
file with a given SHA1 disappeared in the source of an ancestry edge,
and one file with the same SHA1 appeared in the destination of the
ancestry edge. if the file changes concurrently with being moved, you
need to explicitly tell monotone that it is the "same" file (issue a
rename command). this generates a rename cert.
the fact that a file is a link is a platform-specific issue. you can
encode that in .mt-attrs if you like.
I would like manifests to be more like normalised PRCS project
files. These could replace MT directories and .mt-attrs file in
external representation, can help to simplify the userinterface.
And they would allow to store all context in a nicely versioned
way.
I do not want to change the format of manifest files. they are simple
and suit my preferences. if you would like to write a patch, or provide
more detail about why a change to the manifest format would make things
simpler or improve the user interface, I am happy to continue discussing it.
4) As far as I can tell monotone allows for certs to refer to all
kind of VOs. I cannot see what the point of refering a cert to a
specific file version SHA1(file) is.
certain facts are intrinsic to a file. suppose a version is known to not
compile, or to contain a security hole. suppose I have made some code
review comments on a specific file version. my initial (idealistic)
thinking on the matter was that those certs have meaning which is mostly
independent of the version of the tree a file shows up in.
you are right that in some cases it makes more sense to attach a "file"
cert to a particular file in a particular tree. we don't have a cert
type which does that now. perhaps we could add one.
speaking practically rather than idealistically: so far I have found no
*practical* use for file certs at all. I am considering dropping them
altogether.
Certs should only have meaning
when they refer to context or context+content. I will give some
examples below. Please forgive my nonstandard certificate
representation, I think it is quite nice.
For example if I want to state that some file version is the largest,
i.e. has the most bytes of all file versions in the tree (repository?).
wrong:
[cert]
SHA1(file):boast:"This is the largest file":signed-by-me
[end]
current:
[cert]
SHA1(manifest):boast:"SHA1(file) is the largest file":signed-by-me
[end]
my proposal (either like above or):
[cert]
SHA1(manifest).SHA1(file):boast:"This is the largest file":signed-by-me
[end]
If two files have the same SHA1, how can I attach a cert only to
one of them? For example to better describe heritage. As SHA1(file)
only describes the content, there are several ways (heritage) to
get to a specific SHA1(file) and several ways to fork from there,
which should be distinguished in merging. Examples are the empty
file, the temporarily removed file, or a file importet from a
different branch.
Just take the case where two different files reach the same content
in completely different contexts:
Content=Content1=Content2="YES"
Context1="Are You small?"
Context2="Are You shy?".
If heritage is tied to content only this will result in a mess. If
heritage is related to context or context+content, everything should
work out nicely.
[cert]
SHA1(Context1).SHA1(Content):parent:SHA1(Context1.old).SHA1(Content1.old):sig
[end]
hm. well, this is conceivable, if you want your cert to refer to not
just a particular file in a particular tree, but a particular file
*edge* on a particular tree *edge* in the ancestry graph.
but this is all very theoretical. so far the only practical use I have
made of edge-like certs is to describe ancestry, between manifests, and
the only practical use I have made of non-edge-like certs is to describe
the usual changelog stuff: log message, author, date-and-time, branch
name, tag.
Please forgive me but I strongly feel ``ancestor'' is the wrong name for the
heritage cert. It should be ``parent'' really.
hmm. consider this graph:
A -> B -> C
I take "parent" to mean "immediate parent" ([B,A] or [C,B]), which are
the sort of ancestor certs issued when you commit a version. but
monotone accepts the situation where someone puts in a cert [C,A] in as
well, expressing the abbreviated ancestry relationship between C and A.
this has to work because there is no "global" view of ancestry; I can
learn of intermediate versions on an existing edge, or new forks and
merges, any time I communicate with a 3rd party.
6) BTW I do not see the need for certificates. Why are they needed? Are they
just a simple to extend way of adding state(ments) to a version without
changing the version number (SHA1(context)).
yes. they are an "open" metadata vocabulary. more importantly, they are
a vocabulary which you can express partial trust relationships about. I
can say that I will listen to certs you write in one context, but I need
further affirmation by someone else whom I trust more in another, more
sensitive context.
7) I do not see the fundamental meaning of branches in monotone
(i.e. the branch cert). Are they just a convenience (to support
some special kind of merging) or a part of the design? I will call
the monotone branch ``btag'' to avoid collisions with other concepts
of branching in version control.
a branch is a merge obligation. if two heads have the same branch cert,
monotone will try to merge them. if they do not, it will not.
branches have nothing whatsoever to do with ancestry. I can take one
copy of the linux source tree, commit it. take one copy of the freebsd
source tree, commit it. mark both as belonging to the SuperUnix branch.
monotone will then try (not very successfully) to merge them.
As far as I can see, a fork in monotone is what other versioning
systems call a branch.
most other versioning systems have no concept of a fork which is not a
branch. but monotone's notion of a fork is weaker than a branch in other
VC systems. notably: forks do not have names.
Btags states the intention to merge
the tagged version primarily with other versions having the same
btag.
yes.
Unlike branches, btags are not ``_forks_ that should _not_ be
merged'' but on the contrary ``independant _versions_ whos current
heads should primarily merge with each other''. Versions belonging
to the same btag need not fork from some common ancestor. Btags
are just free form certs. They do not certify anything. Not even
that merging tagged heads is sensible.
correct.
I see more problems with merging in large projects (i.e. those with
lots of merge conficts). How can developers even hope to synchronise
their versions?
8) I understand that --- under the reasonable assumption of unique
SHA1 --- two developers notice when they have the same version.
But unfortunately merging conflicts can be quite difficult to resolve
twice in exactly the same way.
Two developers cannot merge packages independently. As monotone
repositories remembers method and result of file merges(see 9 below,
too). Exchanging all the change sets is not the problem. But all
merging would have to be done by one person and the result redistributed
to all others for integration. This prevents the original author
of a change from doing the merging.
that is not completely true. first, realize that many changes *can*
merge automatically, if the merge algorithm is sufficiently clever. then
notice what this means: when I merge 2 trees, suppose (pessimistically)
80% of the changes merge cleanly, with 20% conflicting. suppose you also
merge the same tree, with 20% conflicts. we will each resolve some
conflicts the same way, some differently. suppose half the ways we
choose to resolve the conflicts are the same; we have now produced 2 new
trees -- just as many as we had going into the merge -- but with 90%
less difference.
it is not necessary -- nor possible with monotone's design -- that the
branches seen by all users on all hosts ever assumes a *unique* head.
merely that each communicate-merge-update cycle brings the multiple
heads which *do* exist closer together. it is like CVS in the sense that
at any time there are multiple slightly-different lines of development
happening in parallel; the difference is that in CVS merging between
lines happens only before a commit, and in monotone it can happen before
*or after* a commit. there is no "race to commit" like with CVS.
``The other''TM distributed versioning tool, solves this by cloning
and keeping track of unique repository ids, and doing merges in
staging areas. First push wins. Is there an easy solution in
monotone that I just do not notice?
you can construct both the CVS model and the bitkeeper model on top of
monotone.
if you want to construct the CVS model, you have a centralized "signing
robot" which is a tie-breaker. it signs everything it sees with a
timestamp, and everyone agrees to add a merge hook which favours earlier
robot-signed versions over later ones, in any merge. this makes your
non-mergable changes bounce off "earler" signed ones, so you will always
update before merging, to try to make sure there are no "earlier"
versions you might bounce off.
if you want to construct the bitkeeper model (which is quite nice), you
just make sub-branches and propagate between them. so for example we
have an i18n branch, net.venge.monotone.i18n. if we had a translation
team, I would ask them to commit to that branch primarily. each now and
then I will do a propagate from net.venge.monotone.i18n to
net.venge.monotone, and I will resolve the conflicts; i18n was
essentially a staging area. each now and then (when they're interested
in trying out new features) the i18n developers will do a propagate from
net.venge.monotone to net.venge.monotone.i18n.
if they want to make my life even easier, the i18n guys can build a
tie-breaker robot or build a sub-staging area, so that I always have
something unique to propagate *from* when I'm absorbing a change from
them. or I can say that I trust one of their team-members to construct
merges and propagate *to* the net.venge.monotone branch, and delegate
responsibility for it completely. there's lots of variations which make
sense.
9) When two file versions (SHA1(content)) are merged, monoton uses the
result of that merge for all future merges of these two file versions,
even though they might happen in completely different context. Why?
it seemed like a cheap heuristic to throw in, since it's very unlikely
that a particular merge problem would happen in a different context. but
that's a bit of a silent error waiting to happen, I guess. maybe it's no
good, and should be removed. do you think so?
10) The (usefulness of the) ability to sort / select / prorize packages
by certs when updating is unclear to me. What exactly happens
there? What is updating in terms of merging? The working copy is
a valid version, too. So what is so special about updating? Is
this due to the notion of monotone branches?
this section is being clarified in the current round of development, as
our thoughts on the matter have become more clear. I hope the next
version will have a clearer organization to it, which is based on a
simple accept/reject distinction, rather than arbitrary sorting.
the new rules follow:
- before an ancestry cert is acted upon -- during merge, update, or
otherwise -- it is passed to a trust-evaluation hook. this hook is given
*all* the keys which have produced valid signatures on the particular
manifest ancestor->child assertion of the cert. the hook returns whether
or not to trust the ancestry relationship implied by the cert. the
"monotone approve <id1> <id2>" command adds ancestry certs to the
database. the "monotone disapprove <id1> <id2>" command adds an
ancestry-like "disapproval" cert, any of which are also passed to the
trust evaluation hook. the purpose here is to permit per-edge code review.
- similarly, before an update or checkout happens -- though not a
merge -- monotone checks for "testresult" certs on the predecessor of
the update and the selected update target. these two sets of testresults
are passed to a different trust evaluation hook, which decides if the
proposed update/checkout is sufficiently safe. the purpose here is to
permit "checking out / updating a working copy", as defined by some
autotester. obviously if you want the "most recent possible" version
(tested or otherwise) you can just leave the hook undefined.
I'm running out of steam.
as am I. time to go to work. I hope I've clarified some things. try to
keep in mind that in absence of "simple" answers, I base design
decisions on practical ergonomic and social needs -- observed in the
line of real work -- rather than ideals about the perfect VC system. if
you can show important operational failures, or ways in which it makes a
real person feel uncomfortable, I am always interested in seeing them.
please ask further if you have further questions.
-graydon
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Monotone-devel] Re: (long) some thoughts on monotone; still unable to post on monotone-devel.,
graydon hoare <=