monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] newbie question - SHA1 vs serials


From: K. Richard Pixley
Subject: Re: [Monotone-devel] newbie question - SHA1 vs serials
Date: Wed, 20 Apr 2005 09:17:33 -0700
User-agent: Mozilla Thunderbird 1.0.2 (Macintosh/20050317)

Richard Levitte - VMS Whacker wrote:

In message <address@hidden> on Tue, 19 Apr 2005 12:39:41 -0700, "K. Richard Pixley" 
<address@hidden> said:

This sounds like the beginning of a complete redesign of the monotone
authentication mechanism.  I assume there's space for user
authentication and signatures in your system.
Yes, it does, unfortunately. The public/private keys are lovely and remarkably convenient for authenticating a particular connection and even for signing revisions but I don't think they make for a very useful identification mechanism at all. We already have a number of very common identification mechanisms, (email addresses, unix user names, unix uids, ldap id's, etc), though none of these are necessarily represented in monotone.

In effect, the only way I know who made a change is by looking backward to figure out who sent me a particular public key. If someone else is responsible for managing the public keys and their distribution within our development circle, then I lose the ability to even determine who made a particular change. This is a grave loss.

To extend monotone to include some concept of actual users will require separating the concepts of authentication mechanism from whatever token is used for user identification purposes. I think this will ultimately be inevitable if monotone is to scale into widespread corporate or commercial use and probably inevitable if monotone is to support any significant number of developers concurrently.

rich> I do have concerns about the multiheaded model.  I can see how
rich> it can work for small projects.  But at some scale, perhaps 128
rich> heads?

My choice would be to see if there's a need for named branches in such
a situation.

No, I'm honestly talking about 128 developers all working on the same branch, pretty much concurrently. This is the interesting scaling issue. If monotone can only solve this problem via partitioning, either partitioning users by branching, or partitioning checkin's by time, or whatever, then monotone necessarily forces a working methodology onto it's users very early, which, IMO, is a fatal flaw in a tool of this type. Existing tools can manage this case either directly or via feature branching.

 I honestly believe the developers will be able to take
it upon themselves to merge a little now and then, especially if they
notice that they aren't getting some changes because they're hanging
on to their own head, or because monotone tells them there are several
hreads to choose from.
Enlightened self interest is a powerful motivator, but it presumes a strong value for unity. In the free software world, this is supported by the desire to see our changes in an official release, which, in turn, diminishes our individual support overhead as a general rule.

However, in commercial development, if management strongly urges code production speed over all other considerations, then merging and/or unity tends to fall by the wayside. This is just an unfortunate fact of most commercial development, I'm afraid.

rich> Or a scale where the heads query never returns the same answer
rich> twice in a row?  I'm not sure it will actually be manageable.

Same answer.  I yeah, I agree, with a huge number of heads, management
may be tough.  How many developers building on the same source are we
talking about?
That would be the question that determines the prize.

Consider a source layout where nearly every piece of our vast and complex product line includes OurCompany.h because OurCompany.h is the authoritative central repository of key items like protocol constants. All developers in the company need access to it. All products in the company use it. And it's under active development. So we see hundreds of people making changes to it resulting in 10's of changes per day. Churn is sufficiently high that achieving merged state isn't really possible.

At this point, many companies adopt either a feature branching scheme, elect a gate keeper, or add any of a number of tactics like reviews which seem to simply target churn reduction.

In my experience, this number is somewhere around 64 active developers or 128 or more moderate or largely inactive developers. At 64 active, the repository never settles down short of intentional code freezes. At 128 semi-active, disagreements arise and the necessity of going up higher in the management chain in order to resolve those disagreements means that divergences are longer lived and developers become both more entrenched in their disagreements and also more dependent on their specific differences.

Typically around 64 users, we also run into the issue that the central repository really needs to be available 24/7 and this generally means assigning specialized IT type folks to the task. Distributed systems like monotone can probably live with letting the central repository go down for an hour each day since the local repository remains available, at least for another factor of 2. But somewhere around 128 users or so, we'll need to invent methods for backing up and archiving the central repository without any down time.

CVS doesn't handle branching very well, and merging is even worse. So feature branching is difficult and error prone which leads groups forced to use CVS with the necessity of partitioning often with as few as 16 or 32 active concurrent developers on the same code. Subversion branches much more easily, though it doesn't appear to me as though branches are any more easily managed and I'm not yet sure about merges. My guess is that subversion will push the CVS number up by at least 2 and maybe 4.

Contrast with clearcase, where judicious use of branching has been documented to support over 512 users in a number of cases before those companies even began to use partitioning.

My current estimate is that monotone's merging support and it's ability to support geographic disparity, will allow it to support at least as many concurrent developers as subversion. I think the key distribution problem will be a scale limiting factor before head management. I think they key distribution problem will begin to be uncomfortable at 32, become difficult at 64, and become impossible at 92 or with even modest turnover.

I think resolving the key distribution problem means adding the ability to do other forms of authentication in which, essentially, the key distribution problem has already been solved.

rich> I suspect that at some scale, there will need to be some other
rich> form of user partitioning or phased or heirarchical revision
rich> propogation in order to track all of the changes and forks.
rich> It's not yet clear to me how this will work out or even at what
rich> point it will break down.

Those are good points, and worth investigating if we get a chance.
That's a big risk, though. More suitable for academic environments that commercial ones, though academic environments don't usually see this level of cooperative effort or this level of churn.

From where I stand, the biggest scalability problem may actually be
the size of the database itself, for huge installations...
I disagree. Database problems tend to be relatively straightforward these days, well understood, and even without examining code I'll bet that they are very well separated from the rest of monotone. It's likely a fairly simple job to replace the monotone database support with other databases, alternate data structures, etc. Database technology already scale far beyond any of the numbers we're discussing so far. In effect, the database management is really more a question of optimization than one of either correctness or scale.

--rich




reply via email to

[Prev in Thread] Current Thread [Next in Thread]