On an online forum someone asked how to generate a string of
100,000,000 "x"s. They had tried this in a BEGIN section:
for(i=1;i<=100000000;i++) s = s "x"
and wanted to know if there was a better approach. Someone suggested:
s=sprintf("%*s",1000000000,""); gsub(/ /,"x",s)}
which is also what I'd have also suggested, but upon testing that they
found that the sprintf+gsub approach was slower than the loop in gawk
5.1.0 and while I couldn't reproduce that exactly on cygwin, I can
confirm that the sprintf+gsub solution is much slower than I expected:
$ time awk 'BEGIN{for(i=1;i<=100000000;i++) s = s "x"}'
real 1m19.439s
user 0m28.562s
sys 0m50.811s
$ time awk 'BEGIN{s=sprintf("%*s",100000000,""); gsub(/ /,"x",s)}'
real 0m36.604s
user 0m36.093s
sys 0m0.390s
If I remove the gsub() then it runs in half a second:
$ time awk 'BEGIN{s=sprintf("%*s",100000000,"")}'
real 0m0.423s
user 0m0.171s
sys 0m0.202s
so the gsub() itself is taking over 36 seconds to run. Someone else
ran the script on a Mac with BSD awk 20070501 and got:
$ time awk 'BEGIN {s = sprintf("%*s", 100000000, ""); gsub(/ /,
"x", s)}'
real 0m1.744s
user 0m1.645s
sys 0m0.098s
i.e. it ran in under 2 seconds and yet another person said the gawk
solution took 23.5 seconds on their Mac.
So, something is causing gsub() in gawk 5.1.0 is running very slowly
for this case.
Ed.