monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: additional feature for monit-3.0 (for clusters)


From: Oliver Jehle
Subject: Re: additional feature for monit-3.0 (for clusters)
Date: Tue, 29 Oct 2002 09:45:37 +0100

No Problem .. i can write a HOWTO... but after my holidays :-)))  i'm off for 2
weeks and not very
responsive in answering mails :-))

but i will write a short howto... its very easy...





Jan-Henrik Haukeland wrote:

> > Agree - some suggestions?
>
> A though one, a new name :-) If we do not find a good name (I'm blank), we
> could combine, that is, use automonitor and autostart and set the value
> properly in the parser, i.e. override the value of autostart if it's true. 
> (and
> document this behavior)
>
> BTW, using monit together with heartbeat is interesting, do you think you 
> could
> write a FAQ or man file entry for this Oliver? (When we figure out what the
> statement should be)
>
> Jan-Henrik
>
> > Oliver Jehle wrote:
> >
> > >No problems by inserting a new config statement...
> > >
> > >but it should be a xor with autostart... you cannot have autostart=true and
> > >monitor only manual started :
> > >
> > >so i think, giving the autostart a better name would be easier !!!
> > >
> > >
> > >
> > >
> > >Martin Pala wrote:
> > >
> > >
> > >
> > >>Yet one thing - i think that it shouldn't be part of 'autostart'
> > >>statement (it is intended for other task). Maybe new statement such as
> > >>'automonitor [yes|no]' (it sounds strange - maybe someone will had
> > >>better hint :) will be more clear, if such functionality will be added
> > >>to monit.
> > >>
> > >>Martin
> > >>
> > >>Martin Pala wrote:
> > >>
> > >>
> > >>
> > >>>Yeah, not bad idea :)
> > >>>
> > >>>there are two ways to reach similar feature:
> > >>>
> > >>>1.) check process only when started under monit's control as described
> > >>>Oliver - it is very simple and effective method, every cluster node needs
> > >>>only one 'local' monit instance.
> > >>>
> > >>>2.) have monit instance failover with service as part of resource
> > group - in
> > >>>such case it must be installed on shared disks and when cluster
> > >>>reconfiguration is initialized it will start monit process with
> > resource as
> > >>>well => there should be one monit instance per resource (or more
> > accurately
> > >>>per shared disk group on which SCSI reservation is applied). This method
> > >>>doesn't require big monit modification - the only one needed will
> > be to have
> > >>>option for specification of monit's pid file location somewhere in the
> > >>>filesystem (it should be on shared disk group). Resource failover is
> > >>>transparent for monit - it needn't care about shared environment, it will
> > >>>just start/stop itself and monitor/start services => cluster health and
> > >>>shared storage must be monitored/maintained by other service (as
> > for example
> > >>>by mentioned heartbeat).
> > >>>
> > >>>First (Oliver's) method is similar to object registration (as in SUN
> > >>>cluster's pmfadm for example) - it will allow with this extension build
> > >>>simple clusters. There's yet another question - storage maintanance, two
> > >>>ways i think about:
> > >>>
> > >>>a.) described rcscripts (monit-node1 and monit-node2, etc.) will be
> > >>>responsible for storage maintenance (storage reserve/release and 
> > >>>optionaly
> > >>>forcing of it). They shouldn't start 'monit -g service start' before the
> > >>>node masters the storage => it may lead to hard error before
> > touching monit
> > >>>subsystem (similar as in above mentioned variant 2.)
> > >>>
> > >>>b.) start/stop scripts involved by monit will be more
> > sophisticated and will
> > >>>check/maintain storage status (possible do scsi reservation in
> > the case that
> > >>>the node doesn't master it) before trying to start service. While monit
> > >>>currently doesn't watch for return value of these scripts, in the case of
> > >>>failure it will lead to service timeout on monit's level.
> > >>>
> > >>>
> > >>>It is possible to allow one or both methods (variant 1. needs Oliver's
> > >>>patch, variant 2. needs optional pid file location patch).
> > >>>
> > >>>+1 for Oliver's way
> > >>>
> > >>>Maybe it will be usefull for others to have 'howto' for building simple
> > >>>clusters with use of monit :)
> > >>>
> > >>>Greetings,
> > >>>Martin
> > >>>
> > >>>
> > >>>----- Original Message -----
> > >>>From: "Jan-Henrik Haukeland" <address@hidden>
> > >>>To: <address@hidden>
> > >>>Sent: Wednesday, October 23, 2002 7:47 AM
> > >>>Subject: Re: additional feature for monit-3.0 (for clusters)
> > >>>
> > >>>
> > >>>
> > >>>I spoke with Oliver off list and asked him to send a mail to the list
> > >>>for discussion, so does anyone have an opinion on this?
> > >>>
> > >>>Oliver Jehle <address@hidden> writes:
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>>when using heartbeat and groups in monit, i've missed following feature.
> > >>>>
> > >>>>monit should only monitor manualy started resources . and after stopping
> > >>>>it, monit should stop monitor it.
> > >>>>
> > >>>>so i've implemented a third input value for the autostart "started". now
> > >>>>monit monitors a resource only, if you start it with "monit start"...
> > >>>>
> > >>>>why that... see below.... it's my config for hearbeat with monit
> > >>>>
> > >>>>on every node
> > >>>>
> > >>>>/etc/inittab starts monit
> > >>>>/etc/rc3.d/ script execute "monit start heartbeat"
> > >>>>/etc/init.d/ monit-node1 "monit -g node1 start"
> > >>>>/etc/init.d/ monit-node2 "monit -g node2 start"
> > >>>>
> > >>>>so hearbeat can control easy the cluster state and if one node fails,
> > >>>>hearbeat starts monit-xxxx of the failing node and  monit is instructed
> > >>>>to start the services of the failing-node and monitor them...
> > >>>>
> > >>>>
> > >>>>--
> > >>>>Oliver Jehle
> > >>>>Monex AG
> > >>>>Föhrenweg 18
> > >>>>FL-9496 Balzers
> > >>>>
> > >>>>Tel: +423 388 1988
> > >>>>Fax: +423 388 1980
> > >>>>
> > >>>>----
> > >>>>I've not lost my mind. It's backed up on tape somewhere.
> > >>>>----
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>--
> > >>>Jan-Henrik Haukeland
> > >>>
> > >>>
> > >>>_______________________________________________
> > >>>monit-dev mailing list
> > >>>address@hidden
> > >>>http://mail.nongnu.org/mailman/listinfo/monit-dev
> > >>>
> > >>>
> > >>>
> > >>>_______________________________________________
> > >>>monit-dev mailing list
> > >>>address@hidden
> > >>>http://mail.nongnu.org/mailman/listinfo/monit-dev
> > >>>
> > >>>
> > >>>
> > >>>
> > >>_______________________________________________
> > >>monit-dev mailing list
> > >>address@hidden
> > >>http://mail.nongnu.org/mailman/listinfo/monit-dev
> > >>
> > >>
> > >
> > >
> > >
> > >_______________________________________________
> > >monit-dev mailing list
> > >address@hidden
> > >http://mail.nongnu.org/mailman/listinfo/monit-dev
> > >
> > >
> >
> >
> >
> >
> >
> > _______________________________________________
> > monit-dev mailing list
> > address@hidden
> > http://mail.nongnu.org/mailman/listinfo/monit-dev
>
> _______________________________________________
> monit-dev mailing list
> address@hidden
> http://mail.nongnu.org/mailman/listinfo/monit-dev





reply via email to

[Prev in Thread] Current Thread [Next in Thread]