bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Memory leak


From: Andrew J. Schorr
Subject: Re: [bug-gawk] Memory leak
Date: Tue, 28 Mar 2017 10:04:17 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Mar 28, 2017 at 08:26:08AM -0400, Andrew J. Schorr wrote:
> How can you be sure that it's related to the sort function? When you look
> at the memory usage, how many bytes per record are being used? How does
> that compare to the size fo the actual record that you are storing in the
> array?

Using the very small file that you sent me, I see maxrss of 1736k when I load
only a single record, vs maxrss of 1884k when I load all 344 records.
So adding 343 records consumes an additional 148k of memory. That comes
to 442 bytes per record. The average record size is 267 bytes, so that
comes to 175 bytes of overhead per record. That is not a crazy number for
gawk's array implementation. What do your calculations show?
I don't see much difference when I comment out the PROCINFO["sorted_in"]
line; do you?

Also, you are actually saving a larger record:
   sort_sec_key_1=" " FIELD11 
   tab_store[nb_tab_store] = sort_sec_key_1  OFS $0

On average, field 11 is 33 bytes. So if we add that, the average saved value
size is 302 bytes. Then we have to add a byte for string termination, so
it's 303 bytes per record.

Note: a gawk NODE uses 88 bytes, and a BUCKET uses 48 bytes, so that's 136
bytes right there. If you add those to the record size, you get 439 bytes.
So everything seems to be in the right ballpark based on this very small
sample dataset, although I think maybe it should use only half a BUCKET
per array entry.

What do you see when you perform similar calculations using a larger
dataset?

Regards,
Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]