bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gsub() is very slow in gawk 5.1.0


From: Ed Morton
Subject: Re: gsub() is very slow in gawk 5.1.0
Date: Thu, 15 Jul 2021 02:31:32 -0500
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0



On 7/15/2021 1:41 AM, arnold@skeeve.com wrote:
Hi Ed.

Ed Morton <mortoneccc@comcast.net> wrote:

I just tried the same script on my Mac using BSD awk 20200816 and it
only took 1.4 seconds to run. Unfortunately I can't install gawk or any
other awk on that machine to test with but I 100% believe the 2 other
people who posted at https://stackoverflow.com/a/68371463/1745001 saying
gawk 5.1.0 on their Macs took 23.5 secs and almost 30 secs respectively.
Once again, you have to compare apples to apples. Part of it is
definitely related to how much RAM you have. I bet that Mac of
yours has 32 Gig or more on it.

On my personal 8 Gig system, I had to kill all other awks.  My work laptop
(Ubuntu 18.04) has 16 Gig. Here's the data:

$ cat t2.awk
BEGIN {
        s=sprintf("%*s",1000000000,""); gsub(/ /,"x",s)
}

$ ./nawk --version
awk version 20210215

$ time ./nawk -f t2.awk

real    2m2.270s
user    0m12.061s
sys     1m50.162s

$ time ./gawk -f t2.awk

real    3m8.238s
user    3m6.167s
sys     0m1.856s

Gawk is 50% slower than nawk, but not 10 or 15 times slower.
The gawk regex routines are much more heavy-weight than what's
in nawk.  And no, I can't substitute in some other regex library.

Interestingly:

$ (export LC_ALL=C ; time ./gawk -f t2.awk)

real    2m30.100s
user    2m28.561s
sys     0m1.484s

So we see that gawk is comparable to nawk when told to not
worry about multibyte locales.

I think we can put this to rest now.

Thanks,

Arnold


That's fine, thanks for taking a look at it.

    Ed.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]