Re: gsub() is very slow in gawk 5.1.0

bug-gawk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gsub() is very slow in gawk 5.1.0

From:	Ed Morton
Subject:	Re: gsub() is very slow in gawk 5.1.0
Date:	Wed, 14 Jul 2021 22:24:07 -0500
User-agent:	Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

On 7/14/2021 8:20 AM, Ed Morton wrote:

On an online forum someone asked how to generate a string of100,000,000 "x"s. They had tried this in a BEGIN section:
   for(i=1;i<=100000000;i++) s = s "x"

and wanted to know if there was a better approach. Someone suggested:

   s=sprintf("%*s",1000000000,""); gsub(/ /,"x",s)}
which is also what I'd have also suggested, but upon testing that theyfound that the sprintf+gsub approach was slower than the loop in gawk5.1.0 and while I couldn't reproduce that exactly on cygwin, I canconfirm that the sprintf+gsub solution is much slower than I expected:
   $ time awk 'BEGIN{for(i=1;i<=100000000;i++) s = s "x"}'

   real    1m19.439s
   user    0m28.562s
   sys     0m50.811s

   $ time awk 'BEGIN{s=sprintf("%*s",100000000,""); gsub(/ /,"x",s)}'

   real    0m36.604s
   user    0m36.093s
   sys     0m0.390s

If I remove the gsub() then it runs in half a second:

   $ time awk 'BEGIN{s=sprintf("%*s",100000000,"")}'

   real    0m0.423s
   user    0m0.171s
   sys     0m0.202s
so the gsub() itself is taking over 36 seconds to run. Someone elseran the script on a Mac with BSD awk 20070501 and got:
   $ time awk  'BEGIN {s = sprintf("%*s", 100000000, ""); gsub(/ /,
   "x", s)}'

   real    0m1.744s
   user    0m1.645s
   sys 0m0.098s
i.e. it ran in under 2 seconds and yet another person said the gawksolution took 23.5 seconds on their Mac.
So, something is causing gsub() in gawk 5.1.0 is running very slowlyfor this case.
    Ed.


FWIW:

   $ time awk 'BEGIN{s=sprintf("%*s",100000000,""); print s}' | sed 's/
   /x/g' >/dev/null

   real    0m40.100s
   user    0m39.608s
   sys     0m0.421s

so GNU sed is apparently just as slow. `tr` is fast as you'd expect butI know that's apples to oranges:


   $ time awk 'BEGIN{s=sprintf("%*s",100000000,""); print s}' | tr ' '
   'x' >/dev/null

   real    0m0.889s
   user    0m0.452s
   sys     0m0.577s

Regards,

    Ed.

[Prev in Thread]

Current Thread

[Next in Thread]

gsub() is very slow in gawk 5.1.0, Ed Morton, 2021/07/14
- Re: gsub() is very slow in gawk 5.1.0, Neil R. Ormos, 2021/07/14
  - Re: gsub() is very slow in gawk 5.1.0, Ed Morton, 2021/07/14
- Re: gsub() is very slow in gawk 5.1.0, Ed Morton <=
- Re: gsub() is very slow in gawk 5.1.0, arnold, 2021/07/15
  - Re: gsub() is very slow in gawk 5.1.0, Ed Morton, 2021/07/15
    - Re: gsub() is very slow in gawk 5.1.0, arnold, 2021/07/15
    - Re: gsub() is very slow in gawk 5.1.0, Wolfgang Laun, 2021/07/15
    - Re: gsub() is very slow in gawk 5.1.0, Ed Morton, 2021/07/15

Prev by Date: Re: gsub() is very slow in gawk 5.1.0
Next by Date: Re: gsub() is very slow in gawk 5.1.0
Previous by thread: Re: gsub() is very slow in gawk 5.1.0
Next by thread: Re: gsub() is very slow in gawk 5.1.0
Index(es):
- Date
- Thread