monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: More intelligent swap monitoring?


From: Jamie Burchell
Subject: RE: More intelligent swap monitoring?
Date: Wed, 25 Jun 2025 23:19:58 +0100

Excellent. Thank you for doing this.

Kind regards,
Jamie

-----Original Message-----
From: monit-general-bounces+jamie=ib3.uk@nongnu.org
<monit-general-bounces+jamie=ib3.uk@nongnu.org> On Behalf Of Lutz Mader
Sent: 25 June 2025 20:08
To: This is the general mailing list for monit <monit-general@nongnu.org>
Subject: Re: More intelligent swap monitoring?

Hello Jamie,
I create a issue to request this feature, see
https://bitbucket.org/tildeslash/monit/issues/1132/additional-swap-monitoring

Nice to know, the changes fit to your requirements,
Lutz


Am 24.06.25 um 11:31 schrieb Jamie Burchell via This is the general
mailing list for monit:
> Hello Lutz
>
> I just wanted to feedback that I have been sucessfully runnning your test
> build with the enchanged swap file monitoring and it's working really
> well.
>
> It would be great to get this in a new release.
>
> Thanks
> Jamie
>
>
>
> Hello Jamie,
> no, something like this should match, if the si is above 100 for one
> cycle.
>
> if pagein > 100 pages then alert
> if pagein > 100 pages for 4 cycles then alert
>
> Or continuously above 100 in each cycle for 4 cycles.
>
> Based on your data you will get a match based on this test rule.
>
> if pagein > 15 pages 4 times within 6 cycles then alert
>
> See
> https://mmonit.com/monit/documentation/monit.html#FAULT-TOLERANCE
>
> To increase the time from 30s to 120s the check should be done every 4
> cycles only.
>
> check system $HOST
>   every 4 cycles
>   if pagein > 100 pages then alert
>
> Lutz
>
> p.s.
> Nice to know.
>
>> Not sure why the EPEL version doesn't need it, but I have it
>> working now.
>
> The LIBNSL is no available in Redhat/REL 9 environments too.
> But in general Monit for Linux is build with LIBNSL, I think.
>
> The pagein/pageout test seems to be useful.
> But the pagefault/pagehit/pagepurge test is not useful and not available
> in all environments.
>
>
> Am 07.05.25 um 15:38 schrieb Jamie Burchell via This is the general
> mailing list for monit:
>> Hi Lutz
>>
>> If I set monit to alert if pagein > 100 pages for 4 cycles, does this
>> mean:
>>
>> 1. Alert if threshold of 100 is reached 4 times or more at each 30 second
>> interval check
>> 2. Alert if there have been > 100 over in a 120 second period?
>>
>> With the config above, this system status triggered it, so I'm assuming
>> option 2.
>>
>> $ vmstat 30
>>
>> procs -----------memory---------- ---swap-- -----io---- -system-- 
>> ------cpu-----
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
>> id
>> wa st
>>  1  0 836140 2360564      0 2250752    0    0    20     6    1    3  3  2
>> 94
>> 0  0
>>  0  0 836140 2339544      0 2254452   20    0   141    22 1325 1915  5  2
>> 93
>> 0  0
>>  0  0 836140 2352480      0 2254520   12    0    12    26  763  936  2  2
>> 96
>> 0  0
>>  0  0 835884 2348424      0 2254584   18    0    18    12  963 1234  2  2
>> 96
>> 0  0
>>  0  0 835628 2344712      0 2254652    9    0     9    24 1419 1925  4  4
>> 92
>> 0  0
>>  0  0 834092 2307980      0 2256760   68    0   135    18 1625 2140  5  4
>> 91
>> 0  0
>>  0  0 833580 2273028      0 2258300   25    0    57    34 1219 1640  4  3
>> 93
>> 0  0
>>
>> Thanks
>> Jamie
>>
>> -----Original Message-----
>> From: monit-general-bounces+jamie=ib3.uk@nongnu.org
>> <monit-general-bounces+jamie=ib3.uk@nongnu.org> On Behalf Of Lutz Mader
>> Sent: 02 May 2025 16:45
>> To: This is the general mailing list for monit <monit-general@nongnu.org>
>> Subject: Re: More intelligent swap monitoring?
>>
>> Hello Jamie,
>> this is a simple tgz Package, feel free to unpack the package and copy
>> the bin/monit file to a proper place. But keep in mind, this is a test
>> package only, based on Monit 5.35.0 for Linux x86_64.
>>
>>> That's brilliant - thank you very much.
>>
>> The official fix will became available with 5.36.0, maybe.
>>
>> The data should be similar to the data from "vmstat -s" and "vmstat 30"
>> (see your monitrc file, option "set daemon  30").
>>
>> Keep in mind,
>> this is for testing/validation purpose only,
>> Lutz
>>
>>
>>
>> Am 02.05.25 um 15:14 schrieb Jamie Burchell via This is the general
>> mailing list for monit:
>>> Hi Lutz
>>>
>>> That's brilliant - thank you very much.
>>>
>>> I'm currently using the version 5.33.0 from EPEL (Rocky Linux 9). How
>>> should
>>> I replace/install the test package?
>>>
>>> Thanks in advance
>>> Jamie
>>>
>>> -----Original Message-----
>>> From: monit-general-bounces+jamie=ib3.uk@nongnu.org
>>> <monit-general-bounces+jamie=ib3.uk@nongnu.org> On Behalf Of Lutz Mader
>>> Sent: 02 May 2025 02:11
>>> To: This is the general mailing list for monit
>>> <monit-general@nongnu.org>
>>> Subject: Re: More intelligent swap monitoring?
>>>
>>> Sorry Jamie, I'm late.
>>>
>>> Based on your suggestion I add a new test to "check system".
>>>
>>> check system $HOST
>>> #  if memory usage > 75% then alert
>>> #  if swap usage > 25% then alert
>>>   if pagein > 10 pages then alert
>>>   if pageout > 20 pages then alert
>>>   if pagefault > 50 pages then alert
>>>
>>> A test package is available from
>>> https://bitbucket.org/lutzmad/monit/downloads/monit-vmstat-suse12-x64.tar.gz
>>>
>>> Let me know, if this will fix you problem,
>>> Lutz
>>>
>>> Appendage:
>>> ~/bin/monit status slesbuild
>>> Monit 5.35.0 uptime: 20m
>>>
>>> System 'slesbuild'
>>>   status                       OK
>>>   monitoring status            Monitored
>>>   monitoring mode              active
>>>   on reboot                    start
>>>   load average                 [0.00] [0.00] [0.05]
>>>   cpu                          0.4%usr 3.3%sys 0.0%nice 2.1%iowait
>>> 0.0%hardirq 0.0%softirq 0.0%steal 0.0%guest 0.0%guestnice
>>>   memory usage                 492.2 MB [27.4%]
>>>   swap usage                   8.0 MB [0.4%]
>>>   pagein count                 0 [58]
>>>   pageout count                0 [2050]
>>>   uptime                       5h 0m
>>>   boot time                    Thu, 01 May 2025 21:02:12
>>>   filedescriptors              3264 [1.8% of 180992 limit]
>>>   data collected               Fri, 02 May 2025 02:03:04
>>>
>>>
>>> Am 17.01.25 um 11:51 schrieb Jamie Burchell via This is the general
>>> mailing list for monit:
>>>> Hello
>>>>
>>>> I have reduced the amount of memory one of the services was consuming,
>>>> which
>>>> has abated the problem for now. However, there's still some swap being
>>>> used
>>>> so perhaps in a few days time the problem will come up again.
>>>>
>>>> Here's the output of /proc/meminfo as requested
>>>>
>>>> MemTotal:        7868472 kB
>>>> MemFree:          671568 kB
>>>> MemAvailable:    3261880 kB
>>>> Buffers:               0 kB
>>>> Cached:           893768 kB
>>>> SwapCached:        47448 kB
>>>> Active:          2661256 kB
>>>> Inactive:        1320396 kB
>>>> Active(anon):    2400964 kB
>>>> Inactive(anon):   800724 kB
>>>> Active(file):     260292 kB
>>>> Inactive(file):   519672 kB
>>>> Unevictable:        3072 kB
>>>> Mlocked:               0 kB
>>>> SwapTotal:       4194300 kB
>>>> SwapFree:        3923228 kB
>>>> Zswap:                 0 kB
>>>> Zswapped:              0 kB
>>>> Dirty:                32 kB
>>>> Writeback:             0 kB
>>>> AnonPages:       3003564 kB
>>>> Mapped:           169140 kB
>>>> Shmem:            113804 kB
>>>> KReclaimable:    2124156 kB
>>>> Slab:            2540272 kB
>>>> SReclaimable:    2124156 kB
>>>> SUnreclaim:       416116 kB
>>>> KernelStack:       13712 kB
>>>> PageTables:        77444 kB
>>>> SecPageTables:         0 kB
>>>> NFS_Unstable:          0 kB
>>>> Bounce:                0 kB
>>>> WritebackTmp:          0 kB
>>>> CommitLimit:     8128536 kB
>>>> Committed_AS:   12257916 kB
>>>> VmallocTotal:   34359738367 kB
>>>> VmallocUsed:       31016 kB
>>>> VmallocChunk:          0 kB
>>>> Percpu:             1840 kB
>>>> HardwareCorrupted:     0 kB
>>>> AnonHugePages:   1378304 kB
>>>> ShmemHugePages:        0 kB
>>>> ShmemPmdMapped:        0 kB
>>>> FileHugePages:         0 kB
>>>> FilePmdMapped:         0 kB
>>>> CmaTotal:              0 kB
>>>> CmaFree:               0 kB
>>>> Unaccepted:            0 kB
>>>> HugePages_Total:       0
>>>> HugePages_Free:        0
>>>> HugePages_Rsvd:        0
>>>> HugePages_Surp:        0
>>>> Hugepagesize:       2048 kB
>>>> Hugetlb:               0 kB
>>>> DirectMap4k:      118624 kB
>>>> DirectMap2M:     8269824 kB
>>>>
>>>> Regards
>>>> Jamie
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: monit-general-bounces+jamie=ib3.co.uk@nongnu.org
>>>> <monit-general-bounces+jamie=ib3.co.uk@nongnu.org> On Behalf Of Lutz
>>>> Mader
>>>> Sent: 15 January 2025 20:37
>>>> To: This is the general mailing list for monit
>>>> <monit-general@nongnu.org>
>>>> Subject: Re: More intelligent swap monitoring?
>>>>
>>>> Hello,
>>>> I have no useful examples, vmstat swap usage is si=0 and so=0 only.
>>>> The values calculated based on /proc/meminfo fit, on a Linux system.
>>>>
>>>> And the system status information seems to be useful.
>>>>
>>>>> Is it possible to configure Monit to alert of actual swapping
>>>>> out rather than swap file usage, or am I barking up the wrong tree?
>>>>
>>>> You are right, monit does not show the actual swap file IO (page in/out
>>>> data), the data based on the usage.
>>>>
>>>> monit status LINUX
>>>> Monit 5.34.0 uptime: 49d 0h 43m
>>>>
>>>> System 'LINUX'
>>>>   status                       OK
>>>>   monitoring status            Monitored
>>>>   monitoring mode              active
>>>>   on reboot                    start
>>>>   load average                 [7.30] [8.60] [12.77]
>>>>   cpu                          0.8%usr 0.3%sys 14.7%nice 0.0%iowait
>>>> 0.0%hardirq 0.0%softirq 0.0%steal 0.0%guest 0.0%guestnice
>>>>   memory usage                 42.6 GB [11.3%]
>>>>   swap usage                   10.4 MB [0.5%]
>>>>   uptime                       61d 19h 43m
>>>>   boot time                    Thu, 14 Nov 2024 14:54:46
>>>>   filedescriptors              16800 [0.2% of 6815744 limit]
>>>>   data collected               Wed, 15 Jan 2025 10:37:52
>>>>
>>>> The vmstat data.
>>>>
>>>> Swap
>>>> si: Amount of memory swapped in from disk (/s).
>>>> so: Amount of memory swapped to disk (/s).
>>>>
>>>> vmstat
>>>> procs -----------memory---------- ---swap-- -----io---- -system--
>>>> ------cpu-----
>>>> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
>>>> id wa st
>>>> 7  0  10624 273713168 811688 87684144    0    0    69    40    0    0
>>>> 11
>>>>  1 88  0  0
>>>>
>>>> vmstat -s
>>>>     395130516 K total memory
>>>>     120558732 K used memory
>>>>      90364732 K active memory
>>>>      11131200 K inactive memory
>>>>     274571784 K free memory
>>>>        811824 K buffer memory
>>>>      88984004 K swap cache
>>>>       2095100 K total swap
>>>>         10624 K used swap
>>>>       2084476 K free swap
>>>>    1313872869 non-nice user cpu ticks
>>>>    1427185498 nice user cpu ticks
>>>>     247910673 system cpu ticks
>>>>   22633904247 idle cpu ticks
>>>>       6802373 IO-wait cpu ticks
>>>>             0 IRQ cpu ticks
>>>>       2608560 softirq cpu ticks
>>>>             0 stolen cpu ticks
>>>>   17687866581 pages paged in
>>>>   10300491953 pages paged out
>>>>          1345 pages swapped in
>>>>          6655 pages swapped out
>>>>    3444460676 interrupts
>>>>     416044891 CPU context switches
>>>>    1731592487 boot time
>>>>     273618695 forks
>>>>
>>>> The values based on /proc/meminfo.
>>>>
>>>> cat /proc/meminfo
>>>> MemTotal:       395130516 kB
>>>> MemFree:        275604548 kB
>>>> MemAvailable:   353323016 kB
>>>> Buffers:          811800 kB
>>>> Cached:         85849340 kB
>>>> SwapCached:          892 kB
>>>> Active:         89256636 kB
>>>> Inactive:       11124384 kB
>>>> Active(anon):   18733260 kB
>>>> Inactive(anon):  3398376 kB
>>>> Active(file):   70523376 kB
>>>> Inactive(file):  7726008 kB
>>>> Unevictable:      975348 kB
>>>> Mlocked:          975348 kB
>>>> SwapTotal:       2095100 kB
>>>> SwapFree:        2084476 kB
>>>> Dirty:               468 kB
>>>> Writeback:             0 kB
>>>> AnonPages:      14695184 kB
>>>> Mapped:          2965708 kB
>>>> Shmem:           8423756 kB
>>>> Slab:            8029800 kB
>>>> SReclaimable:    2699580 kB
>>>> SUnreclaim:      5330220 kB
>>>> KernelStack:       70320 kB
>>>> PageTables:       421232 kB
>>>> NFS_Unstable:          0 kB
>>>> Bounce:                0 kB
>>>> WritebackTmp:          0 kB
>>>> CommitLimit:    199660356 kB
>>>> Committed_AS:   26151808 kB
>>>> VmallocTotal:   34359738367 kB
>>>> VmallocUsed:           0 kB
>>>> VmallocChunk:          0 kB
>>>> HardwareCorrupted:     0 kB
>>>> AnonHugePages:         0 kB
>>>> ShmemHugePages:        0 kB
>>>> ShmemPmdMapped:        0 kB
>>>> HugePages_Total:       0
>>>> HugePages_Free:        0
>>>> HugePages_Rsvd:        0
>>>> HugePages_Surp:        0
>>>> Hugepagesize:       2048 kB
>>>> DirectMap4k:    37410524 kB
>>>> DirectMap2M:    286273536 kB
>>>> DirectMap1G:    80740352 kB
>>>>
>>>> The values are used to calculate the monit swap data.
>>>>
>>>> in src/process/sysdep_LINUX.c
>>>>
>>>> used_system_memory_sysdep(SystemInfo_T *si)
>>>>
>>>>         // Swap
>>>>         if (! (ptr = strstr(buf, "SwapTotal:")) || sscanf(ptr + 10,
>>>> "%llu", &swap_total) != 1) {
>>>>                 Log_error("system statistic error -- cannot get swap
>>>> total amount\n");
>>>>                 goto error;
>>>>         }
>>>>         if (! (ptr = strstr(buf, "SwapFree:")) || sscanf(ptr + 9,
>>>> "%llu", &swap_free) != 1) {
>>>>                 Log_error("system statistic error -- cannot get swap
>>>> free amount\n");
>>>>                 goto error;
>>>>         }
>>>>         si->swap.size = swap_total * 1024;
>>>>         si->swap.usage.bytes = (swap_total - swap_free) * 1024;
>>>>
>>>> The question is,
>>>> how does the /proc/meminfo output data look like on your system.
>>>>
>>>> Are some examples available, based on vmstat and the /proc/meminfo
>>>> data.
>>>>
>>>> Lutz
>>>>
>>>>
>>>> Am 14.01.25 um 12:10 schrieb Jamie Burchell via This is the general
>>>> mailing list for monit:
>>>>> Hi
>>>>>
>>>>>
>>>>>
>>>>> I currently use Monit to alert me if swap usage is over > 20%. This
>>>>> works
>>>>> most of the time, but I have a particularly stubborn VM currently
>>>>> which
>>>>> appears to like to add data to swap and then not touch it. Using
>>>>> vmstat
>>>>> shows there are either no, or maybe the odd non-zero swap in operation
>>>>> and
>>>>> no swap outs. Is it possible to configure Monit to alert of actual
>>>>> swapping
>>>>> out rather than swap file usage, or am I barking up the wrong tree?
>>>>>
>>>>>
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>> Jamie
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]