[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Savannah-users] broken 'git clone ssh://git.sv.gnu.org/srv/git/PKG.
Re: [Savannah-users] broken 'git clone ssh://git.sv.gnu.org/srv/git/PKG.git'
Wed, 24 Feb 2016 10:53:33 -0700
Bob Proulx wrote:
> The git-daemon is being invoked through a timeout wrapper script
> invoked through the xinetd. The xinetd is limited to 20 parallel
> daemon processes for exactly the reason to avoid too much load. It
> used to be 40 parallel git-daemon processes but when Emacs converted
> to git and all 40 slots were running it overloaded the system and it
> would melt down. Xinetd git-daemon maximum instances was reduced to
> 20 in order to reduce the level to something that the system could
> reasonable handle. It has been that way for the last year. But the
> failure of an immediate git clone failure does not match the behavior
> that should be happening when max instances has been reached. In that
> case it should be paused waiting for a free slot (might eventually
> time out if it goes too long) and not failing immediately.
After more digging and debugging the problem seems to be the xinetd
behaving badly. It appears that when xinetd has reached max instances
that it immediately closes the incoming network connection! I haven't
looked in the xinetd source yet to confirm this but that appears to be
the behavior I am observing from it. Obviously that is bad. It
should queue the connection and wait for a free slot. If I am right
about this problem it will affect all xinetd services including all of
git, svn, cvs, and bzr.
Fixing this problem is complicated by the issue of the system being
blocked for OS upgrades other than security upgrades for a while.
That hard issue was just unblocked and a general upgrade is now
pending but that general upgrade won't be production ready for a bit
yet. Darn. Because in the newer versions of git-daemon this is
trivially handled without the xinetd. Of course that won't help the
other services such as cvs.
I am doing more analysis to verify this is the problem and working to
resolve the issue. Until then I can only suggest retrying as it will
eventually work when the client load on the server is reduced.