[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Savannah-hackers] address@hidden: revision control systems]
From: |
Richard Stallman |
Subject: |
[Savannah-hackers] address@hidden: revision control systems] |
Date: |
Fri, 18 Jan 2002 17:18:26 -0700 (MST) |
Could some of you please look at `arch' and tell me
your technical opinion? Does it have promise?
How does it compare with CVS and Subversion?
The second message says where you can get it.
Please respond, to me and all the recipients, if you
are willing to investigate this. I want to know that someone
is doing it, that this has not been overlooked.
------- Start of forwarded message -------
Date: Fri, 28 Dec 2001 15:46:17 -0800 (PST)
From: Tom Lord <address@hidden>
To: address@hidden
Subject: revision control systems
I've written a free software source code management and revision
control system called `arch'. I think `arch' compares well with
CVS and Subversion and some of the commercial competition.
Some quick highlights of the feature list are:
+ distributed databases -- each hacker or group can host their
own branches. There's a global (world wide) name-space for
lines of development and revisions. Branches can be formed
from any repository to any other and merge operations can
span repository boundaries without needing to actually
duplicate the full contents of a repository at each site.
+ fancy merging -- `arch' has support for various styles
of history-sensitive branch merging. The way branches
and patch-sets interact with distributed repositories
makes it practical to distribute the responsibilities
for patch-review and merging.
+ renames handled -- of course file and directory renames
are handled accurately. So are symbolic links and file
permissions.
+ unobtrusive operation -- `arch' is designed to stay out
your way while making changes and rearranging files. It
is designed to have a clean and self-documenting
command-line interface having the finest characteristics of
good Unix tools.
`arch' is, at its core, a collection of shell scripts and a tiny bit
of new C code. It brings many classic shell-utils, FTP, diff, and
patch together and turns them into a distributed version control
system. In spite of the simplicity, `arch' is not a toy: its quite
sophisticated and, in my opinion, elegant. It captures the style of
diff/patch use that we used to use before remote-CVS took over the
world, fills in some gaps, and packages the whole deal behind a nice
(command line) user interface. Competing RC systems are far more
complex than they need to be.
Enclosed below is a longer list of `arch' features.
Could you let me know if `arch' is interesting to you? I'm trying to
find a commercial sponsor to help move it forward. One obstacle I've
encountered is that arch is new so there isn't yet "enthusiastic
community support" for it -- a sort of chicken-and-egg problem.
`arch' is newer than other systems -- so it is less tested. From a
hacking point of view, what I'd really want to be able to do is a few
months of intensive and focused testing and tuning, culminating in
applying it so some larger projects.
A user's guide for arch, describing most of the features and how to
use them, is available at:
http://www.regexps.com/super-secret/arch.html
regards,
- -t
Key Features: Branching and Merging
* Fancy Tagging, Branching, and Merging
`arch' is designed with unprecedented support for developing on
branches and performing complex merges with automated assistance.
Forming a branch (or tag) is inexpensive in both space and time.
Tags are revisioned -- meaning that complete history is kept of how
a tag has been applied.
For merging, `arch' provides a number of operations:
`update': a `CVS'-style merge operator (diff the working copy
against a common ancestor (from any branch) and apply those diffs to
the latest revision).
`replay': a `Subversion'-style history sensitive merge operator
(apply to the working copy all deltas that are found in the latest
revision (from any branch) but not previously applied to the working
copy).
`reconcile': an operation unique to `arch' which plans a
multi-branch `replay'-based merge, finding an ordering of patches
from those branches which minimizes sources of potential conflicts.
`i-merge': another operation ("idempotent merge") unique to `arch':
i-merge forms a revision whose delta from its ancestors consists
entirely of merges with other branches (any combination of `update'
and `replay'). `replay' and `update' can treat such deltas
specially, skipping them for trees that have already undergone
similar merges. `i-merge' makes history-sensitive merging more
effective and helps a team of programmers avoid having to repeatedly
solve the same set of merge conflicts. (The `i-merge' feature is
the only one mentioned in this message not done yet. Based on my
experience implementing similar feature, `i-merge' needs 2-3 days to
get working and pass initial testing. I've postponed implementing
it until I have a chance to work on `arch' full-time again -- using
the planned feature as a kind of cognitive book-mark to recover my
state after being away from the code for a few weeks.)
`replay --exact' and `replay --list': operations which allow you to
apply revision deltas in any user-selected order, while still taking
advantage of history-sensitivity.
`mkpatch' and `dopatch': `arch''s "next generation" replacements for
`diff -r -c' and `patch'. These can be used to perform arbitrary
delta computation and applications on working copies.
* Directory and File Renames Handled Cleanly
Changes are tracked across file and directory renames. For example,
if you have a local working directory and "update" against the
repository (merge changes in the repository with local changes) -- and
either or both the repository or your local tree has been
"rearranged" -- the merge process takes those renames into account.
As a practical matter, this creates an important new degree of
freedom for developers: the freedom to "clean up" code by improving
its organization without having to pay a high cost in revision
control system maintenance.
Key Features: Repositories
* Distributed Revision Databases
`arch' has a global (as in "world wide") name-space for revisions.
`arch' seamlessly integrates all accessible revision repositories,
both local and remote, into one large database. Branches can span
repository boundaries, etc. That has big implications for open
source processes, both intra-organizationally, and on a global
scale.
Each developer or organization can have a private database for
day-to-day work, or for organization- or feature-specific branches.
Loosely cooperating organizations can have separately administered
repositories that, nevertheless, mutually support branching and
merging.
An unwelcome source of de-facto authority (hosting a public
project's `CVS' repository) is undermined by `arch'. More
positively, `arch' lowers the barriers to coordinated
inter-organizational development: if your repository is publicly
readable, anybody can create branches -- there is no need to hand
out write access to everyone who wants to play.
* Low Cost Server Administration
`arch' remote repository access is via the FTP protocol. An `arch'
server can be a generic (unix-based) FTP server.
Server administration requirements are minimal: databases can be
created trivially and (unlike `CVS') never become wedged (except as
a result of file system failures (or, sigh, bugs -- if there are
any)). Repositories can be easily migrated. Repositories can be
mirrored for read-only purposes.
* Atomic, Concurrent, Independent, and Durable Transactions
Commits are atomic. Concurrent commits to separate lines of
development are permitted. Commits are independent of "gets"
(check-outs). Commits are durable to the limits of the underlying
file system. If a commit hangs (say, a client dies) with locks
held -- those locks can be broken remotely.
Key Features: Logging
* Useful Semi-Automated Logging
`arch' log entries contain lots of automatically generated
information that is useful for browsing repository history and for
performing intelligent (history sensitive) merges.
* Automatic ChangeLog Maintenance
`arch' can automatically generate GNU-style ChangeLog files from
revision control log entries. If your tree contains automatically
generated log files, `arch' will update them during `commit', and
after every merge operation that changes a revision's patch history.
Key Features: User Interface
* Patch Set Browsing
Any patch set, for a committed revision, between a working copy and
its ancestors, or between arbitrary trees, can be summarized in an
HTML-formatted report, with lists of renamed files and directories,
and hyper-links to individual file deltas, added files, and removed
files. This is a boon to developers writing log entries and to
patch reviewers. One of my favorite commands has become:
netscape --remote "openURL(`arch what-changed --url`)"
* Command-line Driven, Self-Documenting
`arch' is a collection of small and simple software tools. The
collection has very regular and thorough conventions for option
names and defaulting behavior. Every command has an extensive
`--help' message describing its options and functionality. The
command `arch --help-commands' gives an orderly summary of all the
commands available with brief descriptions of each.
* Far More GUI Work Possible
`arch' is designed from the ground up to be layered under separately
developed GUIs. For example, `arch''s log entries contain enough
information to drive a graph-drawing branch-merge graph of revision
history, conveniently represented as plain-text data in RFC822-style
message headers.
Key Features: Performance Metrics
* Pretty Fast, Efficient Use of Bandwidth, Effective Use of Disk Space
`arch' seems to be pretty fast, and for good reasons. Tree-deltas
(patches) are exchanged with servers as compressed tar files.
`arch' makes clever use of client-side caching. On my
(unremarkable) system, `commit' processes around 10 files per
second. (Rigorous comparative benchmarking and final tuning remains
to done, however).
* Maintainable Size
The heart of the implementation (around 30K lines) is (ahem) almost
entirely shell scripts and awk code. (This is not a joke -- `arch'
is a serious system.) In spite of the size and implementation
languages, `arch' is more featureful than `CVS' and seems to be
faster at common-case operations.
* Useful Subsets Small Enough to Add to Other Source Packages
It is practical to distribute a tiny subset of `arch' with any of
your source packages. Contributors without repositories can use
that subset to prepare `arch'-compatible patches or to apply `arch'
patch sets.
regards
- -t
------- End of forwarded message -------
Date: Wed, 16 Jan 2002 01:34:23 -0800 (PST)
From: Tom Lord <address@hidden>
To: address@hidden
In-reply-to: <address@hidden> (message from Tom Lord
on Sat, 29 Dec 2001 23:59:34 -0800 (PST))
Subject: Re: revision control systems
Date: Sat, 29 Dec 2001 23:59:34 -0800 (PST)
From: Tom Lord <address@hidden>
Can you tell me where they can look at the source code?
Not at the moment, but quite soon I hope.
I was about to make a release after fixing some porting nits, then get
caught up adding some new features.
I'll re-ping you when it's ready.
-t
I've made the first public release of `arch', a new revision control
system. You can find it at:
http://www.regexps.com
The user's guide is on-line, as is a simple repository browser for the
change history. There's a read-only copy of arch's self-hosted
repository there too. (Please let me know if the web pages give you
troubles on your particular browser -- this is the first time I've
tried using tables so heavily.)
Some of the key advantages of arch compared to CVS are:
1. Atomic, whole-tree commits, reliable repository database.
2. File and directory renames handled cleanly.
3. Fancy features for branching and merging:
For example, arch has a high level merge operator that is
especially good for projects where multiple maintainers of
a project each work on separate branches, merging to and
from a shared "trunk" to stay in sync (the `star-merge'
command, so called because the graph of trunk and branches
has a star topology).
4. Distributed repositories
arch treats all accessible repositories as one big
repository, permitting branch and merge operations to span
repository boundaries. "World-Wide Revision Control" :-)
This eliminates the need for non-core contributors to
resort to diff/patch and simplifies the change-review
task for maintainers.
5. Automatic ChangeLog maintenance.
6. Configuration management for multi-package distributions.
7. Weighs in at about 30K lines of code.
(Some of the lines are rather wide, though :-)
arch is in pretty good shape in the sense that the core functionality
is done and I've been using it heavily, myself. The main weaknesses,
and, hence, opportunities to contribute are:
1. I use it only on a BSD-based system. Though porting
to other platforms should be easy, it won't be a noop,
and it hasn't been done yet.
2. Since revision control ought to be rock-solid reliable,
a comprehensive test suite for arch is an important goal:
but it's a large job.
3. The web interface and facilities for browsing revision
history are a bit weaker than I'd like -- I'm working on
that, though.
4. No facility, yet, for automatically converting a CVS
repository into an arch repository.
5. No fancy GUI, yet, for drawing a graph that illustrates
the branching and merging history of a project.
6. No fancy GUI, yet, for running arch commands via a
control panel.
7. For very large and/or active projects, some performance
tuning is likely to be desirable. I've been using arch on
a tree with around 1500 files and find performance to be
acceptable. (By way of contrast, GCC has around 6500 files
(at least in the old distribution I have on hand)). I
perform a small handful of commits per day (whereas (I
presume) that across all branches, GCC gets at least
dozens). It is straightforward to speed up the arch
commands that might cause problems -- they were written for
simplicity and functionality first, omitting some obvious
speed-ups.
-t
- [Savannah-hackers] address@hidden: revision control systems],
Richard Stallman <=