bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gsub() is very slow in gawk 5.1.0


From: arnold
Subject: Re: gsub() is very slow in gawk 5.1.0
Date: Wed, 14 Jul 2021 22:29:52 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Hi.

Ed Morton <mortoneccc@comcast.net> wrote:

> On an online forum someone asked how to generate a string of 100,000,000 
> "x"s. They had tried this in a BEGIN section:
>
>     for(i=1;i<=100000000;i++) s = s "x"
>
> and wanted to know if there was a better approach.

There isn't. Particularly with gawk, which optimizes the

        s = s x

case to use the C realloc. The GLIBC realloc in particular seems to
be extremely fast.

Every other awk I've tried I've had to kill the run on this simple
program.

> Someone suggested:
>
>     s=sprintf("%*s",1000000000,""); gsub(/ /,"x",s)}
>
> which is also what I'd have also suggested, but upon testing that they 
> found that the sprintf+gsub approach was slower than the loop in gawk 

This is not at all surprising.  gsub is doing a regex match each time
through the string to find the beginning and end of the text to be replaced.
You're doing 100 million matches, and full regex matching isn't as
fast as simple character comparisons (which is what tr is doing).

I won't relate to the rest of your timings, as you compare different
awks on different systems, some based on hearsay from others, to boot.
In particular, citing the MacOS awk from 2007 is a waste of time; it's
14 years out of date, and the current version of Unix awk is trivially
available.

gsub is heavy-weight tool, and that's all there is to it.

Thanks,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]