bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#43389: 28.0.50; Emacs memory leaks


From: Trevor Bentley
Subject: bug#43389: 28.0.50; Emacs memory leaks
Date: Wed, 11 Nov 2020 22:15:21 +0100

Thanks. This trace doesn't show how many bytes were allocated, does it? Without that it is hard to judge whether these GnuTLS calls could be the culprit. Because the full trace shows other calls to malloc, for example this:

It doesn't show the size of the individual allocations, but it indirectly shows the size of the heap. Each brk() line like this one is the start of an entry:

0.000000 brk(0x55f5ed93e000) = 0x55f5ed93e000 Where the first field is relative time since the last brk() call, and the argument in parentheses is the size requested. Subtracting the argument to one call from the argument to the previous call shows how much the heap has been extended. In this capture, subtracting the first from the last shows that the heap grew by 8,683,520 bytes, and summing the relative timestamps shows that this happened in 90.71 seconds. It's growing at about 100KB/sec at this point.

Also, keep in mind that this is brk(). There could have been any number of malloc() calls in between, zero or millions, but these are the ones that couldn't find any unused blocks and had to extend the heap.

I'm not sure how Emacs could be the culprit here. If GnuTLS is the culprit (and as explained above, this is not certain at this point), perhaps upgrading to a newer GnuTLS version or reporting this to GnuTLS developers would allow some progress.

I think you are right, GnuTLS was probably a symptom, not a cause. I took a while to respond because I tried running emacs in Valgrind's Massif heap debugging tool, and it took forever. Some results are in now, and it looks like GnuTLS wasn't present in the leak this time around.

First of all, if you aren't familiar with Massif (as I wasn't), it captures occassional snapshots of the whole heap and all allocations, and lets you dump a tree-view of those allocations later with the "ms_print" tool. The timestamps are fairly useless, as they are in "number of instructions executed." Here are three files from my investigation:

The raw massif output:

http://trevorbentley.com/massif.out.3364630

The *full* tree output:

http://trevorbentley.com/ms_print.3364630.txt

The tree output showing only entries above 10% usage:

http://trevorbentley.com/ms_print.thresh10.3364630.txt

What you can see from the handy ASCII graph at the top is that memory usage was chugging along, growing upwards for a couple of days, and then spiked very quickly up to just over 4GB over a few hours.

If you scroll down to the very last checkpoint (the 10% threshold file is better for this), you can see where most of the memory is used. Very large sums of memory, but from different sources. 1.7GB from lisp_align_malloc (nearly all from Fcons), 1.4GB from lmalloc (half from allocate_vector_block), 700MB from lrealloc (mostly from enlarge_buffer_text).

There were no large buffers open, but there were long-lived network sockets and plenty of timers. I didn't check, but I'd say the largest buffer was up to a couple of megabytes, since emacs-slack logs fairly heavily.

I'm not sure what to make of this, really. It seems like a general, sudden-onset, intense craving for more memory while not particularly doing much. I could blindly suggest extreme memory fragmentation problems, but that doesn't seem very likely.

It's trivial to reproduce, but takes 3-5 days, so not exactly handy to debug. Let me know if you have any requests for the next iteration before I kill it. It's running in Valgrind again.

Thanks,

-Trevor





reply via email to

[Prev in Thread] Current Thread [Next in Thread]