[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: More intelligent swap monitoring?
From: |
Jamie Burchell |
Subject: |
RE: More intelligent swap monitoring? |
Date: |
Wed, 25 Jun 2025 23:19:58 +0100 |
Excellent. Thank you for doing this.
Kind regards,
Jamie
-----Original Message-----
From: monit-general-bounces+jamie=ib3.uk@nongnu.org
<monit-general-bounces+jamie=ib3.uk@nongnu.org> On Behalf Of Lutz Mader
Sent: 25 June 2025 20:08
To: This is the general mailing list for monit <monit-general@nongnu.org>
Subject: Re: More intelligent swap monitoring?
Hello Jamie,
I create a issue to request this feature, see
https://bitbucket.org/tildeslash/monit/issues/1132/additional-swap-monitoring
Nice to know, the changes fit to your requirements,
Lutz
Am 24.06.25 um 11:31 schrieb Jamie Burchell via This is the general
mailing list for monit:
> Hello Lutz
>
> I just wanted to feedback that I have been sucessfully runnning your test
> build with the enchanged swap file monitoring and it's working really
> well.
>
> It would be great to get this in a new release.
>
> Thanks
> Jamie
>
>
>
> Hello Jamie,
> no, something like this should match, if the si is above 100 for one
> cycle.
>
> if pagein > 100 pages then alert
> if pagein > 100 pages for 4 cycles then alert
>
> Or continuously above 100 in each cycle for 4 cycles.
>
> Based on your data you will get a match based on this test rule.
>
> if pagein > 15 pages 4 times within 6 cycles then alert
>
> See
> https://mmonit.com/monit/documentation/monit.html#FAULT-TOLERANCE
>
> To increase the time from 30s to 120s the check should be done every 4
> cycles only.
>
> check system $HOST
> every 4 cycles
> if pagein > 100 pages then alert
>
> Lutz
>
> p.s.
> Nice to know.
>
>> Not sure why the EPEL version doesn't need it, but I have it
>> working now.
>
> The LIBNSL is no available in Redhat/REL 9 environments too.
> But in general Monit for Linux is build with LIBNSL, I think.
>
> The pagein/pageout test seems to be useful.
> But the pagefault/pagehit/pagepurge test is not useful and not available
> in all environments.
>
>
> Am 07.05.25 um 15:38 schrieb Jamie Burchell via This is the general
> mailing list for monit:
>> Hi Lutz
>>
>> If I set monit to alert if pagein > 100 pages for 4 cycles, does this
>> mean:
>>
>> 1. Alert if threshold of 100 is reached 4 times or more at each 30 second
>> interval check
>> 2. Alert if there have been > 100 over in a 120 second period?
>>
>> With the config above, this system status triggered it, so I'm assuming
>> option 2.
>>
>> $ vmstat 30
>>
>> procs -----------memory---------- ---swap-- -----io---- -system--
>> ------cpu-----
>> r b swpd free buff cache si so bi bo in cs us sy
>> id
>> wa st
>> 1 0 836140 2360564 0 2250752 0 0 20 6 1 3 3 2
>> 94
>> 0 0
>> 0 0 836140 2339544 0 2254452 20 0 141 22 1325 1915 5 2
>> 93
>> 0 0
>> 0 0 836140 2352480 0 2254520 12 0 12 26 763 936 2 2
>> 96
>> 0 0
>> 0 0 835884 2348424 0 2254584 18 0 18 12 963 1234 2 2
>> 96
>> 0 0
>> 0 0 835628 2344712 0 2254652 9 0 9 24 1419 1925 4 4
>> 92
>> 0 0
>> 0 0 834092 2307980 0 2256760 68 0 135 18 1625 2140 5 4
>> 91
>> 0 0
>> 0 0 833580 2273028 0 2258300 25 0 57 34 1219 1640 4 3
>> 93
>> 0 0
>>
>> Thanks
>> Jamie
>>
>> -----Original Message-----
>> From: monit-general-bounces+jamie=ib3.uk@nongnu.org
>> <monit-general-bounces+jamie=ib3.uk@nongnu.org> On Behalf Of Lutz Mader
>> Sent: 02 May 2025 16:45
>> To: This is the general mailing list for monit <monit-general@nongnu.org>
>> Subject: Re: More intelligent swap monitoring?
>>
>> Hello Jamie,
>> this is a simple tgz Package, feel free to unpack the package and copy
>> the bin/monit file to a proper place. But keep in mind, this is a test
>> package only, based on Monit 5.35.0 for Linux x86_64.
>>
>>> That's brilliant - thank you very much.
>>
>> The official fix will became available with 5.36.0, maybe.
>>
>> The data should be similar to the data from "vmstat -s" and "vmstat 30"
>> (see your monitrc file, option "set daemon 30").
>>
>> Keep in mind,
>> this is for testing/validation purpose only,
>> Lutz
>>
>>
>>
>> Am 02.05.25 um 15:14 schrieb Jamie Burchell via This is the general
>> mailing list for monit:
>>> Hi Lutz
>>>
>>> That's brilliant - thank you very much.
>>>
>>> I'm currently using the version 5.33.0 from EPEL (Rocky Linux 9). How
>>> should
>>> I replace/install the test package?
>>>
>>> Thanks in advance
>>> Jamie
>>>
>>> -----Original Message-----
>>> From: monit-general-bounces+jamie=ib3.uk@nongnu.org
>>> <monit-general-bounces+jamie=ib3.uk@nongnu.org> On Behalf Of Lutz Mader
>>> Sent: 02 May 2025 02:11
>>> To: This is the general mailing list for monit
>>> <monit-general@nongnu.org>
>>> Subject: Re: More intelligent swap monitoring?
>>>
>>> Sorry Jamie, I'm late.
>>>
>>> Based on your suggestion I add a new test to "check system".
>>>
>>> check system $HOST
>>> # if memory usage > 75% then alert
>>> # if swap usage > 25% then alert
>>> if pagein > 10 pages then alert
>>> if pageout > 20 pages then alert
>>> if pagefault > 50 pages then alert
>>>
>>> A test package is available from
>>> https://bitbucket.org/lutzmad/monit/downloads/monit-vmstat-suse12-x64.tar.gz
>>>
>>> Let me know, if this will fix you problem,
>>> Lutz
>>>
>>> Appendage:
>>> ~/bin/monit status slesbuild
>>> Monit 5.35.0 uptime: 20m
>>>
>>> System 'slesbuild'
>>> status OK
>>> monitoring status Monitored
>>> monitoring mode active
>>> on reboot start
>>> load average [0.00] [0.00] [0.05]
>>> cpu 0.4%usr 3.3%sys 0.0%nice 2.1%iowait
>>> 0.0%hardirq 0.0%softirq 0.0%steal 0.0%guest 0.0%guestnice
>>> memory usage 492.2 MB [27.4%]
>>> swap usage 8.0 MB [0.4%]
>>> pagein count 0 [58]
>>> pageout count 0 [2050]
>>> uptime 5h 0m
>>> boot time Thu, 01 May 2025 21:02:12
>>> filedescriptors 3264 [1.8% of 180992 limit]
>>> data collected Fri, 02 May 2025 02:03:04
>>>
>>>
>>> Am 17.01.25 um 11:51 schrieb Jamie Burchell via This is the general
>>> mailing list for monit:
>>>> Hello
>>>>
>>>> I have reduced the amount of memory one of the services was consuming,
>>>> which
>>>> has abated the problem for now. However, there's still some swap being
>>>> used
>>>> so perhaps in a few days time the problem will come up again.
>>>>
>>>> Here's the output of /proc/meminfo as requested
>>>>
>>>> MemTotal: 7868472 kB
>>>> MemFree: 671568 kB
>>>> MemAvailable: 3261880 kB
>>>> Buffers: 0 kB
>>>> Cached: 893768 kB
>>>> SwapCached: 47448 kB
>>>> Active: 2661256 kB
>>>> Inactive: 1320396 kB
>>>> Active(anon): 2400964 kB
>>>> Inactive(anon): 800724 kB
>>>> Active(file): 260292 kB
>>>> Inactive(file): 519672 kB
>>>> Unevictable: 3072 kB
>>>> Mlocked: 0 kB
>>>> SwapTotal: 4194300 kB
>>>> SwapFree: 3923228 kB
>>>> Zswap: 0 kB
>>>> Zswapped: 0 kB
>>>> Dirty: 32 kB
>>>> Writeback: 0 kB
>>>> AnonPages: 3003564 kB
>>>> Mapped: 169140 kB
>>>> Shmem: 113804 kB
>>>> KReclaimable: 2124156 kB
>>>> Slab: 2540272 kB
>>>> SReclaimable: 2124156 kB
>>>> SUnreclaim: 416116 kB
>>>> KernelStack: 13712 kB
>>>> PageTables: 77444 kB
>>>> SecPageTables: 0 kB
>>>> NFS_Unstable: 0 kB
>>>> Bounce: 0 kB
>>>> WritebackTmp: 0 kB
>>>> CommitLimit: 8128536 kB
>>>> Committed_AS: 12257916 kB
>>>> VmallocTotal: 34359738367 kB
>>>> VmallocUsed: 31016 kB
>>>> VmallocChunk: 0 kB
>>>> Percpu: 1840 kB
>>>> HardwareCorrupted: 0 kB
>>>> AnonHugePages: 1378304 kB
>>>> ShmemHugePages: 0 kB
>>>> ShmemPmdMapped: 0 kB
>>>> FileHugePages: 0 kB
>>>> FilePmdMapped: 0 kB
>>>> CmaTotal: 0 kB
>>>> CmaFree: 0 kB
>>>> Unaccepted: 0 kB
>>>> HugePages_Total: 0
>>>> HugePages_Free: 0
>>>> HugePages_Rsvd: 0
>>>> HugePages_Surp: 0
>>>> Hugepagesize: 2048 kB
>>>> Hugetlb: 0 kB
>>>> DirectMap4k: 118624 kB
>>>> DirectMap2M: 8269824 kB
>>>>
>>>> Regards
>>>> Jamie
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: monit-general-bounces+jamie=ib3.co.uk@nongnu.org
>>>> <monit-general-bounces+jamie=ib3.co.uk@nongnu.org> On Behalf Of Lutz
>>>> Mader
>>>> Sent: 15 January 2025 20:37
>>>> To: This is the general mailing list for monit
>>>> <monit-general@nongnu.org>
>>>> Subject: Re: More intelligent swap monitoring?
>>>>
>>>> Hello,
>>>> I have no useful examples, vmstat swap usage is si=0 and so=0 only.
>>>> The values calculated based on /proc/meminfo fit, on a Linux system.
>>>>
>>>> And the system status information seems to be useful.
>>>>
>>>>> Is it possible to configure Monit to alert of actual swapping
>>>>> out rather than swap file usage, or am I barking up the wrong tree?
>>>>
>>>> You are right, monit does not show the actual swap file IO (page in/out
>>>> data), the data based on the usage.
>>>>
>>>> monit status LINUX
>>>> Monit 5.34.0 uptime: 49d 0h 43m
>>>>
>>>> System 'LINUX'
>>>> status OK
>>>> monitoring status Monitored
>>>> monitoring mode active
>>>> on reboot start
>>>> load average [7.30] [8.60] [12.77]
>>>> cpu 0.8%usr 0.3%sys 14.7%nice 0.0%iowait
>>>> 0.0%hardirq 0.0%softirq 0.0%steal 0.0%guest 0.0%guestnice
>>>> memory usage 42.6 GB [11.3%]
>>>> swap usage 10.4 MB [0.5%]
>>>> uptime 61d 19h 43m
>>>> boot time Thu, 14 Nov 2024 14:54:46
>>>> filedescriptors 16800 [0.2% of 6815744 limit]
>>>> data collected Wed, 15 Jan 2025 10:37:52
>>>>
>>>> The vmstat data.
>>>>
>>>> Swap
>>>> si: Amount of memory swapped in from disk (/s).
>>>> so: Amount of memory swapped to disk (/s).
>>>>
>>>> vmstat
>>>> procs -----------memory---------- ---swap-- -----io---- -system--
>>>> ------cpu-----
>>>> r b swpd free buff cache si so bi bo in cs us sy
>>>> id wa st
>>>> 7 0 10624 273713168 811688 87684144 0 0 69 40 0 0
>>>> 11
>>>> 1 88 0 0
>>>>
>>>> vmstat -s
>>>> 395130516 K total memory
>>>> 120558732 K used memory
>>>> 90364732 K active memory
>>>> 11131200 K inactive memory
>>>> 274571784 K free memory
>>>> 811824 K buffer memory
>>>> 88984004 K swap cache
>>>> 2095100 K total swap
>>>> 10624 K used swap
>>>> 2084476 K free swap
>>>> 1313872869 non-nice user cpu ticks
>>>> 1427185498 nice user cpu ticks
>>>> 247910673 system cpu ticks
>>>> 22633904247 idle cpu ticks
>>>> 6802373 IO-wait cpu ticks
>>>> 0 IRQ cpu ticks
>>>> 2608560 softirq cpu ticks
>>>> 0 stolen cpu ticks
>>>> 17687866581 pages paged in
>>>> 10300491953 pages paged out
>>>> 1345 pages swapped in
>>>> 6655 pages swapped out
>>>> 3444460676 interrupts
>>>> 416044891 CPU context switches
>>>> 1731592487 boot time
>>>> 273618695 forks
>>>>
>>>> The values based on /proc/meminfo.
>>>>
>>>> cat /proc/meminfo
>>>> MemTotal: 395130516 kB
>>>> MemFree: 275604548 kB
>>>> MemAvailable: 353323016 kB
>>>> Buffers: 811800 kB
>>>> Cached: 85849340 kB
>>>> SwapCached: 892 kB
>>>> Active: 89256636 kB
>>>> Inactive: 11124384 kB
>>>> Active(anon): 18733260 kB
>>>> Inactive(anon): 3398376 kB
>>>> Active(file): 70523376 kB
>>>> Inactive(file): 7726008 kB
>>>> Unevictable: 975348 kB
>>>> Mlocked: 975348 kB
>>>> SwapTotal: 2095100 kB
>>>> SwapFree: 2084476 kB
>>>> Dirty: 468 kB
>>>> Writeback: 0 kB
>>>> AnonPages: 14695184 kB
>>>> Mapped: 2965708 kB
>>>> Shmem: 8423756 kB
>>>> Slab: 8029800 kB
>>>> SReclaimable: 2699580 kB
>>>> SUnreclaim: 5330220 kB
>>>> KernelStack: 70320 kB
>>>> PageTables: 421232 kB
>>>> NFS_Unstable: 0 kB
>>>> Bounce: 0 kB
>>>> WritebackTmp: 0 kB
>>>> CommitLimit: 199660356 kB
>>>> Committed_AS: 26151808 kB
>>>> VmallocTotal: 34359738367 kB
>>>> VmallocUsed: 0 kB
>>>> VmallocChunk: 0 kB
>>>> HardwareCorrupted: 0 kB
>>>> AnonHugePages: 0 kB
>>>> ShmemHugePages: 0 kB
>>>> ShmemPmdMapped: 0 kB
>>>> HugePages_Total: 0
>>>> HugePages_Free: 0
>>>> HugePages_Rsvd: 0
>>>> HugePages_Surp: 0
>>>> Hugepagesize: 2048 kB
>>>> DirectMap4k: 37410524 kB
>>>> DirectMap2M: 286273536 kB
>>>> DirectMap1G: 80740352 kB
>>>>
>>>> The values are used to calculate the monit swap data.
>>>>
>>>> in src/process/sysdep_LINUX.c
>>>>
>>>> used_system_memory_sysdep(SystemInfo_T *si)
>>>>
>>>> // Swap
>>>> if (! (ptr = strstr(buf, "SwapTotal:")) || sscanf(ptr + 10,
>>>> "%llu", &swap_total) != 1) {
>>>> Log_error("system statistic error -- cannot get swap
>>>> total amount\n");
>>>> goto error;
>>>> }
>>>> if (! (ptr = strstr(buf, "SwapFree:")) || sscanf(ptr + 9,
>>>> "%llu", &swap_free) != 1) {
>>>> Log_error("system statistic error -- cannot get swap
>>>> free amount\n");
>>>> goto error;
>>>> }
>>>> si->swap.size = swap_total * 1024;
>>>> si->swap.usage.bytes = (swap_total - swap_free) * 1024;
>>>>
>>>> The question is,
>>>> how does the /proc/meminfo output data look like on your system.
>>>>
>>>> Are some examples available, based on vmstat and the /proc/meminfo
>>>> data.
>>>>
>>>> Lutz
>>>>
>>>>
>>>> Am 14.01.25 um 12:10 schrieb Jamie Burchell via This is the general
>>>> mailing list for monit:
>>>>> Hi
>>>>>
>>>>>
>>>>>
>>>>> I currently use Monit to alert me if swap usage is over > 20%. This
>>>>> works
>>>>> most of the time, but I have a particularly stubborn VM currently
>>>>> which
>>>>> appears to like to add data to swap and then not touch it. Using
>>>>> vmstat
>>>>> shows there are either no, or maybe the odd non-zero swap in operation
>>>>> and
>>>>> no swap outs. Is it possible to configure Monit to alert of actual
>>>>> swapping
>>>>> out rather than swap file usage, or am I barking up the wrong tree?
>>>>>
>>>>>
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>> Jamie
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>