gsub() is very slow in gawk 5.1.0

From: Ed Morton
Subject: gsub() is very slow in gawk 5.1.0
Date: Wed, 14 Jul 2021 08:20:57 -0500
On an online forum someone asked how to generate a string of 100,000,000 "x"s. They had tried this in a BEGIN section:

   for(i=1;i<=100000000;i++) s = s "x"

and wanted to know if there was a better approach. Someone suggested:

   s=sprintf("%*s",1000000000,""); gsub(/ /,"x",s)}

which is also what I'd have also suggested, but upon testing that they found that the sprintf+gsub approach was slower than the loop in gawk 5.1.0 and while I couldn't reproduce that exactly on cygwin, I can confirm that the sprintf+gsub solution is much slower than I expected:

   $ time awk 'BEGIN{for(i=1;i<=100000000;i++) s = s "x"}'

   real    1m19.439s
   user    0m28.562s
   sys     0m50.811s

   $ time awk 'BEGIN{s=sprintf("%*s",100000000,""); gsub(/ /,"x",s)}'

   real    0m36.604s
   user    0m36.093s
   sys     0m0.390s

If I remove the gsub() then it runs in half a second:

   $ time awk 'BEGIN{s=sprintf("%*s",100000000,"")}'

   real    0m0.423s
   user    0m0.171s
   sys     0m0.202s

so the gsub() itself is taking over 36 seconds to run. Someone else ran the script on a Mac with BSD awk 20070501 and got:

   $ time awk  'BEGIN {s = sprintf("%*s", 100000000, ""); gsub(/ /,
   "x", s)}'

   real    0m1.744s
   user    0m1.645s
   sys 0m0.098s

i.e. it ran in under 2 seconds and yet another person said the gawk solution took 23.5 seconds on their Mac.

So, something is causing gsub() in gawk 5.1.0 is running very slowly for this case.


