bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gsub() is very slow in gawk 5.1.0


From: Ed Morton
Subject: Re: gsub() is very slow in gawk 5.1.0
Date: Wed, 14 Jul 2021 22:24:07 -0500
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

On 7/14/2021 8:20 AM, Ed Morton wrote:
On an online forum someone asked how to generate a string of 100,000,000 "x"s. They had tried this in a BEGIN section:

   for(i=1;i<=100000000;i++) s = s "x"

and wanted to know if there was a better approach. Someone suggested:

   s=sprintf("%*s",1000000000,""); gsub(/ /,"x",s)}

which is also what I'd have also suggested, but upon testing that they found that the sprintf+gsub approach was slower than the loop in gawk 5.1.0 and while I couldn't reproduce that exactly on cygwin, I can confirm that the sprintf+gsub solution is much slower than I expected:

   $ time awk 'BEGIN{for(i=1;i<=100000000;i++) s = s "x"}'

   real    1m19.439s
   user    0m28.562s
   sys     0m50.811s

   $ time awk 'BEGIN{s=sprintf("%*s",100000000,""); gsub(/ /,"x",s)}'

   real    0m36.604s
   user    0m36.093s
   sys     0m0.390s

If I remove the gsub() then it runs in half a second:

   $ time awk 'BEGIN{s=sprintf("%*s",100000000,"")}'

   real    0m0.423s
   user    0m0.171s
   sys     0m0.202s

so the gsub() itself is taking over 36 seconds to run. Someone else ran the script on a Mac with BSD awk 20070501 and got:

   $ time awk  'BEGIN {s = sprintf("%*s", 100000000, ""); gsub(/ /,
   "x", s)}'

   real    0m1.744s
   user    0m1.645s
   sys 0m0.098s

i.e. it ran in under 2 seconds and yet another person said the gawk solution took 23.5 seconds on their Mac.

So, something is causing gsub() in gawk 5.1.0 is running very slowly for this case.

    Ed.


FWIW:

   $ time awk 'BEGIN{s=sprintf("%*s",100000000,""); print s}' | sed 's/
   /x/g' >/dev/null

   real    0m40.100s
   user    0m39.608s
   sys     0m0.421s

so GNU sed is apparently just as slow. `tr` is fast as you'd expect but I know that's apples to oranges:

   $ time awk 'BEGIN{s=sprintf("%*s",100000000,""); print s}' | tr ' '
   'x' >/dev/null

   real    0m0.889s
   user    0m0.452s
   sys     0m0.577s

Regards,

    Ed.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]