monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Ideas and questions.


From: Nathaniel Smith
Subject: Re: [Monotone-devel] Ideas and questions.
Date: Tue, 15 Feb 2005 01:04:57 -0800
User-agent: Mutt/1.5.6+20040907i

On Mon, Feb 14, 2005 at 12:51:10AM -0500, Jeremy Fincher wrote:
> On Feb 13, 2005, at 3:21 AM, Nathaniel Smith wrote:
> 
> >On Sat, Feb 12, 2005 at 07:44:09PM -0500, Jeremy Fincher wrote:
[1. "monotone" vs. "mtn"]

Hmm, so, we have an argument for "mtn", that being that it's short --
very fair, probably the only reason "monotone" doesn't annoy me is
that I'm always typing "../opt/m<tab>" or something, so the actual
name is irrelevant.

We also have an argument for "monotone", that being that just because
the rest of Unix insists on being cryptic and unfriendly doesn't mean
we should go along.  Makes sense, especially for people using monotone
outside the Unix environment (though I can't imagine anyone who's
gotten past "cd", "dir" or "ls", etc. finding _this_ project to be the
tipping point, do you think?).  Certainly "mtn" is less discoverable
than "monotone" by someone who thinks "hey, I need a VCS... isn't
there something called, like, 'monotone' or something?... I wonder if
that's installed on this machine...".

So, options:
  a) keep the name "monotone"
  b) switch to "mtn"
  c) do both, i.e. have packages that install the binary as one and
     create a symlink for the other, or somesuch.
(any I'm missing?)

I don't much like (c); we still have to pick one to use in the
documentation, and decide which name to use for the "just grab this
binary" download, and we get confusion and ambiguity about what to
call things.  I also can't think of any precedent for taking _two_
items in the executable namespace.

Of (a) and (b), then, I'm kinda inclined towards (b), on the grounds
that everyone who uses a command line wants to type short commands,
and optimizing for the long-term user case is more important than
optimizing for the just-discovered-the-program case.  But any other
thoughts?

> >>2. I noticed in the manual, each user in the test project named his or
> >>her database "abe.db" or "beth.db" and put it in his or her home
> >>directory -- does this mean that I use one global database for all my
> >>Monotone-managed projects?  If so, what is the advantage of this,
> >>compared to storing a database (or a link to it) in each working
> >>directory's MT/ directory?
> >
> >We don't have enough experience with people working on multiple
> >projects using Monotone yet, to really know what's best here.  My
> >sense is "one db per project" will be what people generally end up
> >doing, but *shrug* we don't really know.
> 
> That's my sense as well.  Perhaps that should be suggested in the 
> manual.

Actually, it is, ever since ~5 minutes after I replied to your last
mail ;-).

> >You can't use a symlink to point to your database; sqlite will get
> >annoyed.  (Two instances of monotone using different names to refer to
> >the same database, won't be able to find each other's rollback logs.)
> 
> I assume that also applies to hard links.

Yes; sqlite relies on the same filename being used every time you open
the same file.  Or... the same basename inside the same directory,
rather; it's okay if you symlink the _directory_ containing your
database.  You just can't symlink the database itself.

(sqlite uses the database's filename to generate a filename for its
rollback log, and if two instances looking at the same database open
different rollback logs... you have a problem.)

> I guess what I'm envisioning is an "easy" interface where a user of my 
> software simply says, "mtn pull http://source.supybot.com/supybot"; and 
> it downloads the database (either by netsync, or by HTTP, then running 
> integrity checks on the contents) and the database is downloaded into 
> MT/db.  Later, if that user wanted to get a new working directory, he 
> could just say "mtn pull supybot supybot-my-branch" (where "supybot" is 
> the directory he originally pulled into) and he's got a new working 
> directory using the same database as the one he pulled from.
> 
> I think Darcs' single greatest (and perhaps only truly useful :)) 
> innovation is the ease with which people can get "up to speed" using 
> it, so if my example commands seem similar, it's not a mistake.  I 
> don't think there's any shame in looking at Darcs and saying, "Man, you 
> guys have a nice command set."

I dunno about this being Darcs's "only truly useful" innovation, but
yeah, seeing how streamlined their basic workflow is has definitely
got me thinking.

Right now, to join a monotone project, you do:

 $ monotone --db=proj.db db init
 $ monotone --db=proj.db genkey address@hidden
   (or else do some dance to transfer your privkey from an old db into
   this one)
 $ wget http://public/key/url -O - | monotone --db=proj.db read
 $ monotone --db=proj.db pull somehost projbranch
 $ monotone --db=proj.db checkout --branch=projbranch

This is pretty absurd.  It should be something like:
 $ monotone newid address@hidden
   (Only once ever, soon after you download monotone)
 $ monotone --db=proj.db join_project http://somehost/proj
 $ monotone --db=proj.db checkout

(many details could be debated here...)

BK actually has a reasonable interface in this regard too, in that "bk
clone", "bk push", "bk pull" are very simple.

> >>3. It says in the manual, "the cert name branch is reserved for use by
> >>monotone."  Does this mean that any given revision may only belong to 
> >>a
> >>single branch at a time?  If so, why is that?  If not, what am I
> >>missing?
> >
> >Hmm, you misinterpret that text -- what it means is that the name
> >"branch" is privileged, it has special semantics, so we want to warn
> >people not to use it for any ad hoc certs they might want to create.
> >
> >I just looked at it, but it's not obvious to me how to make it
> >clearer; any suggestion?
> 
> I suppose I was thinking there was a restriction on the certs on a 
> given revision that they have unique names, but I guess I was wrong.  
> If I can track down where that belief came from, I'll be sure to point 
> it out.

Ah, that makes sense.  If you do find it, please let us know.  There's
definitely no such restriction; it's a distributed system, so how
could we enforce it?

> >>4. This is just a pet peeve of mine, but what are the chances that
> >>Monotone's source code (.cc, .hh files) can be moved into a src/
> >>subdirectory of the main distribution tarball?  As it currently 
> >>stands,
> >>an "ls" in my monotone-0.16 directory doesn't even fit into my 80x51
> >>terminal, and that would be fixed if the .cc and .hh files were in
> >>their own sequestered directory.

Well, I don't want to get in a big debate about this.  I'm basically
-0 on the idea (to use the Python convention); while I'm sensitive to
the need to be welcoming to newcomers, I count ~120 files in the top
level of a fresh checkout, of which ~100 would end up in your src/
dir, and I don't really see that this is a dramatic improvement for
someone getting into the code.  And I hate typing src/ all the time.
But we can definitely revisit the issue if any other new people
experience trouble getting into the project because of this.

> >How to do synchronization with a dumb HTTP server:
> >Export your database to a bunch of nicely arranged flat files.  We'll
> >actually synchronize two directories of these flat files.  The
> >interesting part is how to make them "nicely arranged".
> >
> >Let's define a "merkle directory" format.
> 
> What the heck is a "merkle"?  I've seen that work a lot, and some in 
> the source code, but I'm still not quite clear what it is.

It's a name :-).  Did graydon explain this to you on IRC?  The basic
idea is that we take a Patricia trie on the set of hashes that we want
to synchronize with the other guy's set of hashes, and at each node in
the tree we calculate yet another hash of everything below that node.
Then we can traverse this tree sending the node hashes back and forth
on the wire, and efficiently only traverse parts of the tree that
contain changes.

> >...
> >Sorta baroque, but really not that complicated, and the hooks needed
> >to do it aren't very complex...
> 
> Is there a reason we can't just publish the database itself on the web, 
> and just download that?

Oh, sure, it'd work.  It's just that, e.g., my monotone database is 19
meg, and who wants to download that much every time they update?

[snip]
> It would be *rather* handy.  In fact, it's the only reasonable way I 
> can think of to make a trial implementation of my "picking and choosing 
> patches at better-than-file granularity" -- without this, it seems I'd 
> have to duplicate an entire working directory, apply the "mini-patches" 
> the user selected, and then call monotone with that working directory.

As mentioned in another email, unless you want to get into the
business of generating monotone changesets and signatures yourself,
your latter method is definitely the way to go for a first pass.

> >>6. The manual said that branch names had to be unique globally -- does
> >>this apply to all Monotone databases everywhere, or just the ones that
> >>are working on my project?
> >
> >All monotone databases everywhere, which is why we recommend embedding
> >a domain name into your branch names.
> 
> Oh, well I think that certainly needs to be fixed :)

Oh?  Is it a problem? :-)

> I noted when reading the manual for the SCM that Linus Torvalds uses 
> that when a project is initially imported, a "project key" is randomly 
> generated.  Any subsequent tree that is created using the "clone" 
> command copies this project key.  Perhaps Monotone could use a similar 
> concept with branches; there could be a "branch" object which 
> maintained such a randomly-generated key, and branch certs, rather than 
> include a name, would include a reference to that branch object.  That 
> way, if I synced with a database that had a branch with the same name 
> that was semantically different (i.e., was created by someone else for 
> some other purpose) my database would refuse to accept it because its 
> key differed from the one I already have in my database.

We could do that... in fact, we've basically implemented everything
you say, in terms of "epochs", for use in nasty cases where bugs
require people to rebuild their change history... but, see below.

> That way, the recommendation that branches have unique names could 
> simply be for convenience, not for the integrity of databases.
> 
> Or I could be misunderstanding the reason for this restriction 
> entirely, that's another possibility :)
> 
> >Of course, if you never sync with anyone working on a different
> >project, then it won't matter if your branch names conflict.  But
> >that's an assumption that you probably don't want to make.  It's quite
> >handy to be able to share a netsync server among people, or perhaps to
> >pull multiple projects into a single database to merge between them
> >(e.g., if sqlite used monotone, monotone's own source tree could merge
> >in a branch of it to create the sqlite/ directory, and we'd get
> >history sensitive tracking of upstream changes that way).
> >
> >There's also how things work conceptually.  In your mental model of
> >monotone, it's useful not to think of databases as distinct objects.
> >Rather, there is One True Database In The Sky, which we modify every
> >time we hit "commit" or "merge" or whatever.  Any given database has
> >some partial knowledge of the One True Database; netsync is how
> >databases share what they know with other databases, so that their
> >knowledge increases.
> 
> I think that's an excellent idea, but I also thinks that's another 
> reason why namespace collisions need to be handled more gracefully -- I 
> think that's exactly why the scm-which-will-not-be-named generates a 
> project key for each initialized project, and duplicates it when that 
> project source tree is cloned.

AFAIK, every BK repo is part of just one project, and I speculate that
the "project key" you mention is just to make sure that people don't
accidentally try to sync two repos that are completely unrelated?
(This would be the moral equivalent, of, say, in darcs, pushing from
the emacs repo into the gcc repo -- complete insanity.)

We, on the other hand, _want_ to allow people to sync with other
people, share multiple projects in the same db, etc., and we want it
to _work_.  Which means that branches really do need to have globally
unique names.

Is that so bad?  It's basically just merging the ideas of "branch" and
"project" (and "module" and "server" and whatever else weird concept
some VC out there has).  In CVS, "branch" and "module" and
"repository" are independent concepts, and together name a globally
unique line of development; we just smush them down into one idea.

-- Nathaniel

-- 
"But in Middle-earth, the distinct accusative case disappeared from
the speech of the Noldor (such things happen when you are busy
fighting Orcs, Balrogs, and Dragons)."




reply via email to

[Prev in Thread] Current Thread [Next in Thread]