help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to Generate a Long String of the Same Character


From: Wolfgang Laun
Subject: Re: How to Generate a Long String of the Same Character
Date: Tue, 20 Jul 2021 11:49:05 +0200

On Mon, 19 Jul 2021 at 19:34, Neil R. Ormos <ormos-gnulists17@ormos.org>
wrote:

> Wolfgang Laun wrote:
>
> > The results for the four versions:
> >  *rec*   0m1,436s
> >  *dbl*   0m2.322s
> >  *rpt*  0m13.543s
> >  *sub*  0m27.290s
>
> I was a little surprised that the recursive
> algorithm was so much faster in Wolfgang's tests.
>
Why? gawk programs execute on a virtual machine with a CIS and a couple of
stacks. A function call isn't much worse than a goto.
It is mainly the number of interpreter instructions that counts (e.g., h =
h h x; is better than h = h h; h = h x;)

function neil(n, s,      l, s0l){
>   l=1;
>   s0l=length(s);
>   while (l*2<=n) {
>     l=l+l;
>     s=s s;
>     };
>   if (l<n) s=s substr(s, 1, (n-l)*s0l);
>   return s;
>   };
>
> You need to add
    if( n == 0 ) return "";
as the first instruction in neil. You can try to optimize one 2+l from the
loop. But I still get results where your non-recursive function is somewhat
slower than my recursive one.

I have noticed that gawk 5.1.0 appears to execute both versions a little
faster than 5.0.1, but the difference remains. I can send you all the
details about my environment but I don't think that this would tell you
anything noteworthy.

(I have never been looking at gawk internals before, so all of the
statements below are quite unreliable.)

A somewhat enlightening procedure is to read a dump of the interpreter
code. *rec *results in 32 VM instructions whereas *neil *results in 49. The
salient numbers are the number of instructions executed for each
iteration. *rec
*loops over 22 or 24 VM instructions, the while in *neil *loops over 14
instructions; out of the remainder of 35 instructions 26 are executed with
each call. Iteration count in *rec *is one less for 2^(n-1)+1 to 2^n-1.

Some instructions are remarkably "heavy", length() for a string is one.

Timing the single call
    x = srep( 300000000, "abc" );
also shows that *rec *is faster.

/usr/bin/time is very unreliable. I have done all runs on a machine where a
browser but no other program (except demons) is running. I don't know what
emacs or eclipse do when they are just sitting there, enjoying their idle
time. But if they affect the results of /usr/bin/time, it should not be
partisan.

All of this doesn't really explain why the results differ in this
irrational way.

Cheers
Wolfgang


reply via email to

[Prev in Thread] Current Thread [Next in Thread]