Re: [Monotone-devel] newbie question

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] newbie question - SHA1 vs serials

From:	K. Richard Pixley
Subject:	Re: [Monotone-devel] newbie question - SHA1 vs serials
Date:	Wed, 20 Apr 2005 09:17:33 -0700
User-agent:	Mozilla Thunderbird 1.0.2 (Macintosh/20050317)

Richard Levitte - VMS Whacker wrote:

In message <address@hidden> on Tue, 19 Apr 2005 12:39:41 -0700, "K. Richard Pixley" 
<address@hidden> said:

This sounds like the beginning of a complete redesign of the monotone
authentication mechanism.  I assume there's space for user
authentication and signatures in your system.

Yes, it does, unfortunately. The public/private keys are lovely andremarkably convenient for authenticating a particular connection andeven for signing revisions but I don't think they make for a very usefulidentification mechanism at all. We already have a number of verycommon identification mechanisms, (email addresses, unix user names,unix uids, ldap id's, etc), though none of these are necessarilyrepresented in monotone.

In effect, the only way I know who made a change is by looking backwardto figure out who sent me a particular public key. If someone else isresponsible for managing the public keys and their distribution withinour development circle, then I lose the ability to even determine whomade a particular change. This is a grave loss.

To extend monotone to include some concept of actual users will requireseparating the concepts of authentication mechanism from whatever tokenis used for user identification purposes. I think this will ultimatelybe inevitable if monotone is to scale into widespread corporate orcommercial use and probably inevitable if monotone is to support anysignificant number of developers concurrently.

rich> I do have concerns about the multiheaded model.  I can see how
rich> it can work for small projects.  But at some scale, perhaps 128
rich> heads?

My choice would be to see if there's a need for named branches in such
a situation.

No, I'm honestly talking about 128 developers all working on the samebranch, pretty much concurrently. This is the interesting scalingissue. If monotone can only solve this problem via partitioning, eitherpartitioning users by branching, or partitioning checkin's by time, orwhatever, then monotone necessarily forces a working methodology ontoit's users very early, which, IMO, is a fatal flaw in a tool of thistype. Existing tools can manage this case either directly or viafeature branching.

 I honestly believe the developers will be able to take
it upon themselves to merge a little now and then, especially if they
notice that they aren't getting some changes because they're hanging
on to their own head, or because monotone tells them there are several
hreads to choose from.

Enlightened self interest is a powerful motivator, but it presumes astrong value for unity. In the free software world, this is supportedby the desire to see our changes in an official release, which, in turn,diminishes our individual support overhead as a general rule.

However, in commercial development, if management strongly urges codeproduction speed over all other considerations, then merging and/orunity tends to fall by the wayside. This is just an unfortunate fact ofmost commercial development, I'm afraid.

rich> Or a scale where the heads query never returns the same answer
rich> twice in a row?  I'm not sure it will actually be manageable.

Same answer.  I yeah, I agree, with a huge number of heads, management
may be tough.  How many developers building on the same source are we
talking about?

That would be the question that determines the prize.

Consider a source layout where nearly every piece of our vast andcomplex product line includes OurCompany.h because OurCompany.h is theauthoritative central repository of key items like protocol constants.All developers in the company need access to it. All products in thecompany use it. And it's under active development. So we see hundredsof people making changes to it resulting in 10's of changes per day.Churn is sufficiently high that achieving merged state isn't reallypossible.

At this point, many companies adopt either a feature branching scheme,elect a gate keeper, or add any of a number of tactics like reviewswhich seem to simply target churn reduction.

In my experience, this number is somewhere around 64 active developersor 128 or more moderate or largely inactive developers. At 64 active,the repository never settles down short of intentional code freezes. At128 semi-active, disagreements arise and the necessity of going uphigher in the management chain in order to resolve those disagreementsmeans that divergences are longer lived and developers become both moreentrenched in their disagreements and also more dependent on theirspecific differences.

Typically around 64 users, we also run into the issue that the centralrepository really needs to be available 24/7 and this generally meansassigning specialized IT type folks to the task. Distributed systemslike monotone can probably live with letting the central repository godown for an hour each day since the local repository remains available,at least for another factor of 2. But somewhere around 128 users or so,we'll need to invent methods for backing up and archiving the centralrepository without any down time.

CVS doesn't handle branching very well, and merging is even worse. Sofeature branching is difficult and error prone which leads groups forcedto use CVS with the necessity of partitioning often with as few as 16 or32 active concurrent developers on the same code. Subversion branchesmuch more easily, though it doesn't appear to me as though branches areany more easily managed and I'm not yet sure about merges. My guess isthat subversion will push the CVS number up by at least 2 and maybe 4.

Contrast with clearcase, where judicious use of branching has beendocumented to support over 512 users in a number of cases before thosecompanies even began to use partitioning.

My current estimate is that monotone's merging support and it's abilityto support geographic disparity, will allow it to support at least asmany concurrent developers as subversion. I think the key distributionproblem will be a scale limiting factor before head management. I thinkthey key distribution problem will begin to be uncomfortable at 32,become difficult at 64, and become impossible at 92 or with even modestturnover.

I think resolving the key distribution problem means adding the abilityto do other forms of authentication in which, essentially, the keydistribution problem has already been solved.

rich> I suspect that at some scale, there will need to be some other
rich> form of user partitioning or phased or heirarchical revision
rich> propogation in order to track all of the changes and forks.
rich> It's not yet clear to me how this will work out or even at what
rich> point it will break down.

Those are good points, and worth investigating if we get a chance.

That's a big risk, though. More suitable for academic environments thatcommercial ones, though academic environments don't usually see thislevel of cooperative effort or this level of churn.

From where I stand, the biggest scalability problem may actually be

the size of the database itself, for huge installations...

I disagree. Database problems tend to be relatively straightforwardthese days, well understood, and even without examining code I'll betthat they are very well separated from the rest of monotone. It'slikely a fairly simple job to replace the monotone database support withother databases, alternate data structures, etc. Database technologyalready scale far beyond any of the numbers we're discussing so far. Ineffect, the database management is really more a question ofoptimization than one of either correctness or scale.


--rich

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Monotone-devel] newbie question - SHA1 vs serials, (continued)

Prev by Date: [Monotone-devel] Re: host vs user authentication, was Re: newbie question - SHA1 vs serials
Next by Date: Re: [Monotone-devel] cvs_import/sync questions
Previous by thread: Re: [Monotone-devel] newbie question - SHA1 vs serials
Next by thread: Re: [Monotone-devel] newbie question - SHA1 vs serials
Index(es):
- Date
- Thread