[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Savannah-hackers-public] Re: monitoring
From: |
Sylvain Beucler |
Subject: |
[Savannah-hackers-public] Re: monitoring |
Date: |
Sun, 9 Apr 2006 19:11:17 +0200 |
User-agent: |
Mutt/1.5.11+cvs20060126 |
> Sorry I didn't respond to this while getting myself on the list.
>
> > Knowing about what is running or down is not a big issue - users will
> > probably notice it before the monitoring tool and tell us about
> > it.
>
> Yes, but at least in the past, things often seemed to be down for
> quite a long time with no indication of what was going on, and I
> suspect people typically didn't report problems correctly. My
> experience is that monitoring can help in preventing problems and
> often in indicating what's actually caused a problem, but I know your
> mileage probably varies.
>
> Sorry I'm talking without knowing how things actually run.
You're probably right. As I mentioned nobody reported the recent
ViewCVS downtimes.
[Joshua (from the FSF admins team) recently mentioned they do have a
monitoring system, eg detecting that Apache was not replying yesterday
during a small DoS. I suppose it would be difficult for us to have
access there, so redundancy is not an issue :)]
> > It would be interesting, though, to setup some security checks, such
> > as: is the /home directory well ready-only when accessed through the
> > arch sftp service?
>
> Cfengine can probably help with things like directory permissions and
> cleaning up lock files etc. I'm not sure about something like
> viewcvs, but cfengine can take action depending on running or
> non-running processes.
Ok, can we work on this kind on monitoring? How do you see things?
> > Is it possible for a project member to commit to
> > CVSROOT/? etc.
>
> Yes, I had that sort of thing in mind, but I haven't thought how to do
> it sensibly.
I tried something hand-made. It cannot detect all failures without
human supervision because CVS doesn't return appropriate return codes
sometimes:
http://arch.sv.gnu.org/head/administration/infra/main/0/cvs/cvs-test-suite.sh
Is there's a cleaner way to do it?
> > I'm also concerned about usage statistics. For example, the other day
> > the load went to 20 and I have about no clue what it was due
> > to.
>
> Cfengine has some support for that sort of thing -- alerts and
> monitoring based on process and resource statistics, but I'm not sure
> it's terribly useful in practice.
>
> > Mathieu Roy from Gna! told me about heavy SSH robot attacks that
> > could be more lightly rejected using dynamic IP-based restrictions and
> > inetd.
>
> Don't you do rate-limiting with iptables to combat that? I did try to
> look at the firewalling, but that needs more privilege.
No we don't, though Steven said he would search a script of his next
week that is supposed to do so :) I saw a couple interesting options
in the iptables manpages (though Sarge doesn't have connlimit) - feel
free to share your knowledge.
Check /var/lib/iptables/active btw.
I think most sensible issue is that we don't actually know if we have
such abuses.
> > Users may also be interested in SCM-related stats.
>
> What does SCM stand for here and elsewhere? `Software configuration
> management' or something else?
"VCS" if you prefer the old acronym ;)
I suppose that
http://www.gnu.org/software/gnu-arch/tutorial/Introducing-arch.html
(section 1.4) gives a good definition.
--
Sylvain