[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnu-arch-users] Why we might use subversion instead of arch.
From: |
Pierce T . Wetter III |
Subject: |
[Gnu-arch-users] Why we might use subversion instead of arch. |
Date: |
Fri, 20 Feb 2004 10:45:39 -0700 |
Note: The purpose of this email is not to rile you guys up. Its for me
to document my findings about arch, so that either you can correct my
errors, or so you can improve arch, since I really liked the
distributed nature of it.
Background:
Like many people we use CVS for our version control, because as
someone said once, "CVS sucks, but it sucks less then anything else".
However, Subversion is reaching 1.0 status, so I decided it was worth
checking out some alternatives. Pretty much, that came down to two
choices, svn and tla.
Our setup:
We have a lot of distributed employees, and also employees who
telecommute. Worse case is me in Flagstaff, AZ on dialup talking to
Raleigh, NC. Our current CVS repository is about 300MB with all the
history.
Our work process:
There are two main cases: I (Pierce) tend to make lots of small
incremental changes, because I do the UI. Mike tends to make lots of
large changes, since he works on the backend servers and needs to
change both our object model, and the servers in one pass. I'll call
myself "incremental_guy".
So for me, checkout, edit, update, checkin works great. So I actually
am perfectly happy with CVS.
For Mike, he wants to branch, move everyone else's HEAD changes into
his code, then check back in. What he does now is just have several
checkouts running in parallel all the time, which is actually similar
to arch. We'll call Mike "batch_guy".
Why arch would be cool over subversion:
Since there's no concept of a "central" repository, at best a
"blessed" repository, we could do stuff like the following:
Everytime we code freeze for deployment, we copy "blessed" to
"deployment".
When developers have changes, they merge them into the "deployment"
repository if they're bug fixes for the deployment, along with
"blessed" so that there is a local copy. This is really the same thing
as a deployment branch, but conceptually it seems easier, and it would
avoid problems we have where fixes don't quite make it into the
deployment build.
If two engineers need to work together, like if "batch_guy" needs
to work with "incremental_guy", no one else has to be involved, they
can just merge their changes together.
Since we have a system of servers, clients, etc. most developers
end up having several machines they have to keep in sync. With arch,
your local test server could check out from your personal repository.
How we would have to setup:
Well, first, every developer would end up needing to have a network
accessible "master" archive. Since arch doesn't have any concept of a
server process, that means setting up a web dav server with multiple
subfolders:
/archrepositories/incremental_guy
/archrepositories/batch_guy
/archrepositories/blessed
/archrepositories/development
Predominantly, mostly developers would use the
/archrepositories/development repository as "truth". You'd only need
your "personal" archive if you needed to work with someone else
independently of the archive.
Now for the bad stuff:
Ok, so I tried experimenting with arch. The first thing I did was
check out something from a public arch repository. I got quite a shock.
Evidentially, every arch repository stores the "base code", then
follows that with a series of forward patches. This is quite different
from most other version control systems, which store the head version
as "truth" and then keep reverse patches going backwards. The net
effect of this is that checking out that version required downloading
not just the latest code, but downloading all the patches in between.
That was quite a shock. For projects with lots of small changes, it
probably is inconsequential, but for me, on a dialup, it would really
suck. Now I read some stuff on the wiki about how you can make all that
faster by making a new archive (which moves the base), but I shouldn't
have to change my work process to make the version control system
efficient.
The next thing I noticed was that while CVS and Subversion let you
structure your projects and sub projects via the filesystem, arch
really tries to grab the whole filesystem as one unit. You can override
this a bit, but it involves setting up some config files. Config files
that are kind of poorly documented (based on the fact that I couldn't
make heads or tail of the explanation). This makes a lot of sense for
open source projects focused on a single executable, but makes much
less sense for us. I suspect most people deal with this but just having
lots of arch repositories:
/archrepositories/blessed/tool
/archrepositories/blessed/library
/archrepositories/blessed/application
But that would be a nightmare for us.
The next thing I found was that it was SLOW. tla is kind of brute
force, and all that diff-ing, tar-ing, and compressing can take quite a
while.
So at this point, while the distributed repository stuff was cool, I
had to conclude that arch works best for working on open-source
development where you don't submit code so much as you submit patch
files, and you need to merge patches from multiple places. From that
point of view, arch is great. From ours, ugh.
How I would improve arch:
Fundamentally, I think that arch should store HEAD, with reverse
patches, rather then START with forward patches.
The rsync protocol would make more sense then webdav or ftp.
Improve the documentation, especially needed is a section with some
arch concepts, so that you don't have to pick up everything by osmosis.
While tla is ok as a low-level tool, I've observed that everyone
keeps trying to replace it with a driving script. That's a good
instinct. For one thing, I think that:
user--archive--task
is harder to read then:
tla make-archive --id address@hidden --name archive
tla archive-setup --project hello-world --branch mainline --version 0.1
It would be a trivial change to tla to support passing archive names
as individual parameters, but I think it would flatten the learning
curve of arch. Especially since I think that if you break up the names,
you can realize that it would be pretty easy to come up with standard
defaults for most of these, such that you only have to type:
tla archive-setup --project hello-world
because branch defaults to "mainline", and version defaults to 1.0.
Or perhaps the project name could even be taken from the current
working directory, so all you would need is:
tla archive-setup
Similarly:
tla get --project hello-world hello-world-Alice
Would try to get hello-world--mainline--HEAD, where HEAD is
calculated such that 1.50 is known to be farther then 1.49
Anyways, basically, I'm trying to make the following two points:
blah--blah--blah may be convenient to type, but its hard to
understand, especially because depending on the context, sometimes the
first position is the user id, sometimes its the project, etc. It would
make a lot of sense to make the components explicit (and update the
tutorial), because it would flatten the learning curve. Tla could still
accept the blah--blah--blah format as a short cut.
tla has some naming conventions in practice, but none of them are
defaults in the code. By installing those naming conventions as
defaults, you can also flatten the learning curve. You can also support
additional features for those defaults. For instance, one of the
annoying thing for me about learning tla was that its made of lots of
low-level operations so I have to translate my high level "what I'm
doing" into a whole set of tla commands. Something like:
tla branchstart --task "fix_for_bug" --master master_repository
--- this starts a branch off of a remote repository, with branch
name fix_for_bug, version 1.0.
tla branchupdate
--- grabs HEAD changes from master
tla commit --local
--- commits changes to branch locally
tla commit
--- uploads changes to remote master
tla branchdone
--- merges changes back to mainline in the remote master
Would be much easier to understand. In fact, in general, I'd like to
see all the low-level commands in tla supplanted by high-level commands
based on the use cases.
Something I'd also like to see that I implied above:
--local commits to the local repository.
--remote commits to both the local and remote repository. While tla
doesn't currently have any concept of a "master" repository, I think it
makes sense that the high-level commands would support this concept
that you have local archives you can commit to all the time, with a
remote archive you commit to less often.
Comments appreciated. I'm getting this list in digest mode so if your
comment is "urgent" email me directly.
pierce
- [Gnu-arch-users] Why we might use subversion instead of arch.,
Pierce T . Wetter III <=
- Re: [Gnu-arch-users] Why we might use subversion instead of arch., Tom Lord, 2004/02/20
- Re: [Gnu-arch-users] Why we might use subversion instead of arch., Pierce T . Wetter III, 2004/02/20
- Re: [Gnu-arch-users] Why we might use subversion instead of arch., Miles Bader, 2004/02/20
- Re: [Gnu-arch-users] Why we might use subversion instead of arch., Tom Lord, 2004/02/20
- Re: [Gnu-arch-users] Why we might use subversion instead of arch., Pierce T . Wetter III, 2004/02/20
- Re: [Gnu-arch-users] Why we might use subversion instead of arch., Miles Bader, 2004/02/20
- Re: [Gnu-arch-users] Why we might use subversion instead of arch., Pierce T . Wetter III, 2004/02/20