bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gsub() is very slow in gawk 5.1.0


From: Ed Morton
Subject: Re: gsub() is very slow in gawk 5.1.0
Date: Wed, 14 Jul 2021 23:59:13 -0500
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

On 7/14/2021 11:29 PM, arnold@skeeve.com wrote:
Hi.

Ed Morton <mortoneccc@comcast.net> wrote:

On an online forum someone asked how to generate a string of 100,000,000
"x"s. They had tried this in a BEGIN section:

     for(i=1;i<=100000000;i++) s = s "x"

and wanted to know if there was a better approach.
There isn't. Particularly with gawk, which optimizes the

        s = s x

case to use the C realloc. The GLIBC realloc in particular seems to
be extremely fast.

Every other awk I've tried I've had to kill the run on this simple
program.

Someone suggested:

     s=sprintf("%*s",1000000000,""); gsub(/ /,"x",s)}

which is also what I'd have also suggested, but upon testing that they
found that the sprintf+gsub approach was slower than the loop in gawk
This is not at all surprising.  gsub is doing a regex match each time
through the string to find the beginning and end of the text to be replaced.
You're doing 100 million matches, and full regex matching isn't as
fast as simple character comparisons (which is what tr is doing).

I won't relate to the rest of your timings, as you compare different
awks on different systems, some based on hearsay from others, to boot.
In particular, citing the MacOS awk from 2007 is a waste of time; it's
14 years out of date, and the current version of Unix awk is trivially
available.
I just tried the same script on my Mac using BSD awk 20200816 and it only took 1.4 seconds to run. Unfortunately I can't install gawk or any other awk on that machine to test with but I 100% believe the 2 other people who posted at https://stackoverflow.com/a/68371463/1745001 saying gawk 5.1.0 on their Macs took 23.5 secs and almost 30 secs respectively.

gsub is heavy-weight tool, and that's all there is to it.
I get that to some extent but that's a huge difference in gsub() performance between the BSD and GNU awks - under 2 secs vs about 30 secs.

    Ed.

Thanks,

Arnold




reply via email to

[Prev in Thread] Current Thread [Next in Thread]