monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] Postgresql weirdness: ESTrootFATAL


From: Jan-Henrik Haukeland
Subject: Re: [monit] Postgresql weirdness: ESTrootFATAL
Date: Thu, 13 Nov 2008 23:21:31 +0100

If you have time, would you mind writing a wiki entry for this? Can be useful for others not on the mailing list. Create a new page and put it here http://mmonit.com/wiki/Monit/HowTo


On Nov 13, 2008, at 11:11 PM, David Paper wrote:

Ah, I wish it had been that simple.

In _ALL_ instances, postgres would start up completely, running as UID postgres, as it should be. The 4 examples (included below) of manually starting it all ultimately called pg_ctl as UID postgres.

The tipoff ended up being the errors showing up in groups of 4 exactly every 22 seconds, and NOT showing up during postgres startup messages. Only after the DB was completely online did the error messages show up.

In the monit job, there are 4 tests, 2 against the socket in /tmp and 2 against localhost port 5432.

These 4 tests are what is generating the errors. Since it doesn't matter what UID monit starts the process as, monit itself always runs as root (at least in our installations it does), and thusly monit does all of its testing of processes as UID 0. This is what is generating the errors, as there is no root user in our postgres installs, or a root database.

In doing some poking around the innerwebs, one of my coworkers found this:

http://www.asahi-net.or.jp/~aa4t-nngk/monit2_en.html#pgsqltest  [0]

which basically says, "create a root user in postgres" and "create a database root, owned by root" if you want the pgsql tests to work correctly, and not spit out errors in the logs.

Hope this problem & explanation helps others out as they spend time w/ Monit and Postgres.

-dave

[0] The text of the webpage is below:

PostgreSQL connection test

I may be responsible for explaining this since this is the test I wrote. If you use MONIT 4.7 or earlier, you need to apply pgsql- patch to make use of this protocol test.

A connection test in MONIT is done by opening a connection to the socket which the service is listening, sending some packets and then MONIT will decide if the service is alive based on response the service returns (or none at all). `Socket' can be a TCP/UDP port or a UNIX socket. Before dealing with PGSQL test, we might need to take a look at MONIT's connection test in general.

DNS service connection test example:

if failed host localhost port 53 type udp protocol dns with timeout 10
 then restart

The host argument can be ommited. In that case, host is assumed to be localhost. type defaults to TCP and you can ommit it if it is. protocol can be one of (as of MONIT 4.8) APACHE-STATUS, DNS, DWP, FTP, HTTP, IMAP, LDAP2, LDAP3, MYSQL, PGSQL, NNTP, NTP3, POP, POSTFIX-POLICY, RDATE, RSYNC, SMTP, SSH, TNS, and if you ommit it generic connection test will be used. timeout means how long MONIT will wait before it giives up, whose default value is 5 (seconds).

`pgsql' is not very special in synopsis. Now, let us examine the case when we want to test PostgreSQL's activity through its UNIX socket.

PostgreSQL connection test example (via UNIX socket):

if failed unixsocket /tmp/.s.PGSQL.5432 proto pgsql
 with timeout 15
 then restart

Prerequisites for PGSQL test

As PostgreSQL requires authentication even merely to connect it, certain preparations need to be done before practical use of this test. This procedure is not mandatory because the PGSQL test assumes it to be success when PostgreSQL might demand authentication or tell you there be no such user since they both mean functionality of postmaster. However, you'd better follow the procedure below to keep Postgres' log as clean as possible, that was the very initial aim for which I wrote this code. We are going to create DB user `root' for convenience because of the fact that MONIT is usually run by root. The example below assumes PostgreSQL is 8.x. If yours is older, some synopsis such as subnet format may vary;

  1. Create DB user `root'.
2. Create a database 'root' owned by root. It doesn't need to contain any data.
  3. Add these descriptions to pg_hba.conf;

host root root 127.0.0.1/32 trust <= for test via TCP port local root root ident sameuser <= for test via UNIX socket



On Nov 13, 2008, at 4:41 PM, Dan Colish wrote:

The funky error output is concatenation of error messages and nothing else. The real error is your start command. It looks like you're actually trying to start a database named root. Check the start scripts for pgsql. Does
postgres start outside of monit?

--dan

On Thu, Nov 13, 2008 at 4:18 PM, David Paper <address@hidden> wrote:

Hi monit gurus,

While I anxiously await 5.0 getting out of beta, I have run into the
following problem w/ postgres: When started via monit, postgres spits out
the following errors every 22 seconds in the postgres startup log::

2008-11-13 16:07:04 ESTrootFATAL:  database "root" does not exist
2008-11-13 16:07:04 ESTrootFATAL:  database "root" does not exist
2008-11-13 16:07:04 ESTrootFATAL:  database "root" does not exist
2008-11-13 16:07:04 ESTrootFATAL:  database "root" does not exist
2008-11-13 16:07:26 ESTrootFATAL:  database "root" does not exist
2008-11-13 16:07:26 ESTrootFATAL:  database "root" does not exist
2008-11-13 16:07:26 ESTrootFATAL:  database "root" does not exist
2008-11-13 16:07:26 ESTrootFATAL:  database "root" does not exist

4 entries in each group, forever.
Environment: Monit v 4.10.1 (started out of inittab), Postgres 8.3.4, SuSE
Linux 11.0 x86-64.

This is what the monit job looks like:

check process postgresql with pidfile /opt/postgres/data/ postmaster.pid
group database
start program = "/opt/postgres/bin/pg_ctl start"
     as uid postgres and gid postgres
stop  program = "/opt/postgres/bin/pg_ctl stop"
     as uid postgres and gid postgres
if failed unixsocket /tmp/.s.PGSQL.5432 protocol pgsql then restart
if failed unixsocket /tmp/.s.PGSQL.5432 protocol pgsql then alert
if failed host localhost port 5432             protocol pgsql then
restart
if failed host localhost port 5432 protocol pgsql then alert
if 5 restarts within 5 cycles then timeout

I've also tried doing it like this:

check process postgresql with pidfile /opt/postgres/data/ postmaster.pid
group database
start program = "/etc/init.d/postgresql start"
stop  program = "/etc/init.d/postgresql stop"
if failed unixsocket /tmp/.s.PGSQL.5432 protocol pgsql then restart
if failed unixsocket /tmp/.s.PGSQL.5432 protocol pgsql then alert
if failed host localhost port 5432             protocol pgsql then
restart
if failed host localhost port 5432 protocol pgsql then alert
if 5 restarts within 5 cycles then timeout

Same result.

If I fire up postgres manually using the following methods, I don't see the
error:

su - postgres; /opt/postgres/bin/pg_ctl start
su - postgres; /opt/postgres/bin/pg_ctl start -w -D /opt/postgres/ data -l
/opt/postgres/data/startup.log
su - postgres -c "LD_LIBRARY_PATH=/opt/postgres/lib
/opt/postgres/bin/pg_ctl -w start -D \"/opt/postgres/data\" -l
\"/opt/postgres/data/startup.log\""
(as root) /etc/init.d/postgresql start

It seems that only when fired up via a monit job is this an issue.

According to our DB guy, the error means that postgres is trying to find a
database called "root", which of course, doesn't exist.

I know that Monit doesn't set any environment variables at time of start up
of a jobs process. but I'm baffled as to where this is coming from.

Has anyone else that's running postgres seen this?

Thanks!

-dave

--
Dave Paper

"Hello, I must be going." --Groucho



--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general

--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general



--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general





reply via email to

[Prev in Thread] Current Thread [Next in Thread]