[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Performance of CVS version SVN
RE: Performance of CVS version SVN
Fri, 27 Oct 2006 08:13:33 +1000
>> However I personally think that benchmarks of SCM systems are
>> not very helpful.
> The reason for studying it is that I have a large CVS project,
> which requires SSH access over the net to a remote server
> where it takes 10 minutes to do all the "cvs update"
> operations. People are nagging about that, i.e. I needed to
> see of e.g. Subversion is faster or not.
This is an excellent example of what I think is wrong with benchmarks.
CVS has been around for 20 years, and was/is designed to work with the
network bandwidth that was state of the art 20 years ago - which is
considerably less than what is available now even on dialup.
So if you have performance problems like you are describing they are
most likely due to:
* poor CM process design
* poor implementation of the (CVS) software
I am lucky in that my full time job is the product manager for a
commercial CVS based version control system (CVS Suite, and CVS
Professional). In that role I get to talk regularly with the developers
of the CVS software and thousands of commercial users. I also get calls
every week from people wanting "CVS Repository Replication" to solve
some performance bottleneck. We have a solution for that included in
CVS Pro, but it is not often implemented because in most cases we have
found that the performance bottleneck was easily solved without it.
I suggest a good question for you to ask would be "my updates take 10
minutes, what can I do to improve that, and what benchmarks or traces
can I supply to assist in diagnosis"?
To benchmark this sort of "problem" it's best to start with a
checkout/update on the same machine as the server using :local:, then a
checkout/update on the same machine as the server using ssh, then a
checkout/update on a separate client on the same subnet, then a
checkout/update on the "real client". Include that information with the
information on the hardware (cpu and disks), the network interconnect,
the server/client CVS/CVSNT versions and operating systems as well as
info on the total repo size and total sandbox size, average file sizes,
maximum file size and mix of files (-kb, -kkv) and you'll probably get
quite a few suggestions as to what you can do to improve the speed.
Some of the most common causes of overuse of network
bandwidth/performance problems are:
* time synchronisation between client and server
* anti virus software
* using network shares (NFS or Samba) for repository or sandbox
* disk drivers
* network drivers
* RAID configuration of repository disk or sandbox disk
* memory and pagefile settings
* other 3rd party software (eg: OpenSSH server)
* other processes (eg: poorly configured build software that is polling
the CVS Server)
Some of the most common procedural causes of performance problems are:
* overuse of update (eg: updating every 5 minnutes)
* overuse of checkout (eg: can update be used instead)
* using update where notify scripts should be used
* using a reserved (locking) development model where an unreserved
(concurrent) model can meet the business requirements
* overuse of branches (eg: branch for every change then merge to trunk)
* underuse of branches (ie: not isolating changes that should be
isonlated causing too many changes to appear in each update)
* updating parts of the checked out sandbox that are irrelevant to the
* poor source code design / file hierarchy design
CVSNT 2.5.04 (Free/GPL runs on Unix, Linux, Windows, Mac etc) has a few
specific improvements to network bandwidth use with binary files, and
the traces also include the timestamp on each "line" so that if there
are specific "delays" they can be easily spotted.
In some rare cases the provision of repository caching is warranted and
CVSNT 2.5.04 includes this functionality. Eg: I heard a case where a
repository is stored in USA and at 8am on Monday 500 developers in Japan
logon and do a fresh checkout of an 800Mb module over what amounted to a
dial up connection. Very simple behaviour modification (not deleting
the sandboxes every Friday and using a concurrent development model)
will solve the "problem", but the company needed the assurance that
caching was available as an alternative if the developers behaviour
could not be changed.