info-cvs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: more cvs performance questions (I think they are at least


From: Larry Jones
Subject: Re: more cvs performance questions (I think they are at least
Date: Wed, 29 Oct 2003 00:02:08 -0500 (EST)

Richard Pfeiffer writes:
>
> MIME-Version: 1.0
> Content-Type: text/html; charset=us-ascii

Please do not send MIME and/or HTML encrypted messages to the list.
Plain text only, PLEASE!

> We are running cvs-1.11.  I did migrate us to 1.11.9, but it turned out
> it does not mesh with Eclipse, which is what our developers use.  The
> latest upgrade Eclipse can use is 1.11.6.  From what I read, that has
> its own problems, so 1.11.5 would be the latest we could use.

What was the problem with 1.11.9?  I can't think of any incompatibilites
and you're missing a lot of bug fixes using 1.11.

> We now can have as many as 77 concurrent cvs processes going.

Wow.  That is one busy repository.  Are they all running, or are some of
them sleeping?

> Should cvs even be able to handle this kind of load?  To some of us,
> it's amazing and a credit to cvs that this thing hasn't crashed already.

There isn't any inherent reason that CVS can't handle the load.

> a)     should we be splitting up our repository and giving each project
> their own?

That wouldn't help unless you gave each repository it's own server
machine.

> b)     is there a way to limit the number of pserver calls made at any
> one time?

Since CVS is invoked by inetd, that depends on your particular inetd
implementation.  I'm pretty sure that xinetd does allow you to limit the
number of concurrent servers for a particular service, so if your
implementation doesn't, you may want to consider switching (see
www.xinetd.org).
 
> c) Should we be going to a 4x4 machine rather than our current 2x2?

It sounds like you should consider it, but you should probably ask
someone familiar with Solaris system performance tuning.  It's possible
that your problem is more with memory or I/O that it is with CPU.  Also,
the system may not be tuned appropriately for your work load.

> Context switching seems to be excessive, especially when we have more
> than 2 or 3 cvs ops running together. In the mornings, it's hitting as
> much as 12K per second, which is definitely a killer on a 2-processor
> system.
> 
> a)     Is this normal?

Probably.  CVS is typically I/O intensive, which generally means lots of
context switches.

> b)     Is cvs setup with a ping parameter or some kind of "am I alive"
> setting that hits every 1, 2 or 5 seconds?  If so, can it be reset?

No, CVS doesn't do any kind of pinging.

> Is there any kind of performance bug where just a few processes take up
> a lot of CPU - especially branch commands?  We were getting CPU time
> readings of 41 on one sub-branch process.

> In the doc, I read about setting the LockDir=directory in CVSROOT, where
> I assume I create my own dir in the repository (LockDir=TempLockFiles).

No, you create your own dir somewhere other than in the repository (and
you need to give LockDir an absolute path, not a relative path).  At the
very least, that allows you to offload the lock I/O to a different disk
than the regular I/O.

> a)     Just what is an in-memory file system?

Just what it says -- a filesystem where the data only exists in memory
(rather than being written to a disk); they are commonly used for /tmp.
If you're already using such a filesystem for /tmp, you can just put
LockDir on /tmp (e.g., /tmp/CVSLockDir).  I believe the Solaris variety
is called tmpfs.

> b)     Is speed garnered because all the lock files are in one directory
> and cvs does not need to traverse the project repository?

No, the speed is gained by not waiting for physical I/O to a disk drive.
Because the data doesn't survive a reboot, the system may be able to
take other shortcuts, too.

> c)     Is the speed increase significant?

It can be.

> d)     Will there be any problems with having lock files from multiple
> different projects  in the repository flooding this same directory?

No.

> In this LockDir case, we are going to have lock files from multiple
> different projects all in one dir. It appears by the statement:  "You
> need to create directory, but CVS will create subdirectories of
> directory as it needs them" that the full path is still used, correct?

Correct.  CVS will mirror the repository directory structure under the
LockDir directory.

> The beginning states that cvs will try every 30 seconds  to see if it
> still needs to wait for lock.
> 
> e)     Any chance this is a parameter that can be decreased - or would
> it's checking more often just create more overhead and slow things down?

As of CVS 1.11.6, it's actually a bit more sophisticated -- if your
system allows sub-second sleeping, CVS will first try sleeping for 2, 4,
8, ... , 512 microseconds before giving up and sleeping for 30 seconds. 
In a busy repository like yours, there can be a lot of contention for
the master locks, but they're only held for a very short time, so the
short sleep avoids a long wait in that case.  You get the "waiting for
x's lock' message with every 30 second sleep, so you can get a feel for
how often you're running into lock contention problems.  Of course,
reducing lock contention means that you have more processes trying to
run at the same time rather than some of them sleeping.  Whether that
makes the problem better (because they get done sooner and thus reduce
memory contention) or worse (because the CPU is overcommited) is hard to
say.  It would be better to have a random delay with exponential backoff
rather than the fixed 30 second delay, but no one has gotten around to
implementing it.  (With the fixed delay, processes have an annoying
tendancy to get into sync and constantly run into each other.)

[In re. commit happening in the middle of a multi-directory update:]
> f)     I assume this does not relate only to when LockDir is set.  This
> is the case period, correct?

Correct.

> Is it possible/feasible to have multiple pserver sessions, each then
> having it's own port and each going to the same repository, but going
> one level past that and each going to its own project?  (It wouldn't be
> two repositories, though it might look like it, because only one init
> was ever done.)  Would having each project on its own port help in the
> interest of performance?

No, but it wouldn't make any difference anyway if you're accessing the
same repository.

> Or, switching that around, would there be any benefit to having two
> repositories and connecting both of them to one pserver?

Multiple repositories lets you spread the I/O load across multiple
filesystems, but your problem seems to be more with CPU than I/O.  Of
course, having your repository on a RAID also spreads the I/O load
across multiple disks.

> If a cvs command is killed uncleanly by a crash or by a kill -9, this
> could leave errant locks. I know how to search and remove errant locks
> to get going again.
> 
> a)     But, does this also corrupt the project repository you were
> working on?

That depends on your definition of "corrupt".  CVS is very careful to
never give you a partial RCS file, so the individual files are fine.  On
the other hand, if you were in the middle of a commit, it's quite likely
that some of the files have been committed and some have not.  That's a
situation that CVS normally doesn't allow, but it isn't really corrupt,
either.  All you need to do to fix it is to redo the commit. 

-Larry Jones

I hate being good. -- Calvin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]