[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: additional feature for monit-3.0 (for clusters)

From: Martin Pala
Subject: Re: additional feature for monit-3.0 (for clusters)
Date: Sun, 27 Oct 2002 10:06:36 +0100

Yeah, not bad idea :)

there are two ways to reach similar feature:

1.) check process only when started under monit's control as described
Oliver - it is very simple and effective method, every cluster node needs
only one 'local' monit instance.

2.) have monit instance failover with service as part of resource group - in
such case it must be installed on shared disks and when cluster
reconfiguration is initialized it will start monit process with resource as
well => there should be one monit instance per resource (or more accurately
per shared disk group on which SCSI reservation is applied). This method
doesn't require big monit modification - the only one needed will be to have
option for specification of monit's pid file location somewhere in the
filesystem (it should be on shared disk group). Resource failover is
transparent for monit - it needn't care about shared environment, it will
just start/stop itself and monitor/start services => cluster health and
shared storage must be monitored/maintained by other service (as for example
by mentioned heartbeat).

First (Oliver's) method is similar to object registration (as in SUN
cluster's pmfadm for example) - it will allow with this extension build
simple clusters. There's yet another question - storage maintanance, two
ways i think about:

a.) described rcscripts (monit-node1 and monit-node2, etc.) will be
responsible for storage maintenance (storage reserve/release and optionaly
forcing of it). They shouldn't start 'monit -g service start' before the
node masters the storage => it may lead to hard error before touching monit
subsystem (similar as in above mentioned variant 2.)

b.) start/stop scripts involved by monit will be more sophisticated and will
check/maintain storage status (possible do scsi reservation in the case that
the node doesn't master it) before trying to start service. While monit
currently doesn't watch for return value of these scripts, in the case of
failure it will lead to service timeout on monit's level.

It is possible to allow one or both methods (variant 1. needs Oliver's
patch, variant 2. needs optional pid file location patch).

+1 for Oliver's way

Maybe it will be usefull for others to have 'howto' for building simple
clusters with use of monit :)


----- Original Message -----
From: "Jan-Henrik Haukeland" <address@hidden>
To: <address@hidden>
Sent: Wednesday, October 23, 2002 7:47 AM
Subject: Re: additional feature for monit-3.0 (for clusters)

I spoke with Oliver off list and asked him to send a mail to the list
for discussion, so does anyone have an opinion on this?

Oliver Jehle <address@hidden> writes:

> when using heartbeat and groups in monit, i've missed following feature.
> monit should only monitor manualy started resources . and after stopping
> it, monit should stop monitor it.
> so i've implemented a third input value for the autostart "started". now
> monit monitors a resource only, if you start it with "monit start"...
> why that... see below.... it's my config for hearbeat with monit
> on every node
> /etc/inittab starts monit
> /etc/rc3.d/ script execute "monit start heartbeat"
> /etc/init.d/ monit-node1 "monit -g node1 start"
> /etc/init.d/ monit-node2 "monit -g node2 start"
> so hearbeat can control easy the cluster state and if one node fails,
> hearbeat starts monit-xxxx of the failing node and  monit is instructed
> to start the services of the failing-node and monitor them...
> --
> Oliver Jehle
> Monex AG
> Föhrenweg 18
> FL-9496 Balzers
> Tel: +423 388 1988
> Fax: +423 388 1980
> ----
> I've not lost my mind. It's backed up on tape somewhere.
> ----

Jan-Henrik Haukeland

monit-dev mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]