help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help-gawk Digest, Vol 1, Issue 5


From: J Naman
Subject: Re: Help-gawk Digest, Vol 1, Issue 5
Date: Tue, 20 Jul 2021 13:15:58 -0400

My Benchmark long string functions: cnt=300,000;
          str='a' 'abcde' Prev result
srep_rec  1.374s  1.544s  1.436s
srep_dbl  0.671s  0.546s  2.322s
srep_rpt  3.120s  7.239s  13.543s my cnt2=30,000  vs cnts above=10.0%
srep_sub  2.465s  2.574s  27.290s cnt2=30,000 vs cnt above=10.0%

I can not explain why dbl is 1/2 rec on my computer;
I can not explain why rpt is 3x sub vs 1/2 prev;
* NOTE: I got tired of waiting, so rpt & sub counts are 1/10 of rec & dbl
I can not explain why rpt 70xish rec&dbl vs 6xish prev;
* Windows 7 64-bit; Gawk 5.1.0; Intel i7-4930k 3.40Ghz 6 core;

On Tue, Jul 20, 2021 at 12:04 PM <help-gawk-request@gnu.org> wrote:

> Send Help-gawk mailing list submissions to
>         help-gawk@gnu.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.gnu.org/mailman/listinfo/help-gawk
> or, via email, send a message with subject or body 'help' to
>         help-gawk-request@gnu.org
>
> You can reach the person managing the list at
>         help-gawk-owner@gnu.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Help-gawk digest..."
>
>
> Today's Topics:
>
>    1. Re: How to Generate a Long String of the Same Character
>       (Wolfgang Laun)
>    2. Re: How to Generate a Long String of the Same Character
>       (Andrew J. Schorr)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 20 Jul 2021 11:49:05 +0200
> From: Wolfgang Laun <wolfgang.laun@gmail.com>
> To: "Neil R. Ormos" <ormos-gnulists17@ormos.org>
> Cc: Help Gawk List <help-gawk@gnu.org>
> Subject: Re: How to Generate a Long String of the Same Character
> Message-ID:
>         <CANaj1Lch8=dpwEccdyNQQO5A=
> gQLG7x3Ci7kd3F6RQ+zA4L55Q@mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> On Mon, 19 Jul 2021 at 19:34, Neil R. Ormos <ormos-gnulists17@ormos.org>
> wrote:
>
> > Wolfgang Laun wrote:
> >
> > > The results for the four versions:
> > >  *rec*   0m1,436s
> > >  *dbl*   0m2.322s
> > >  *rpt*  0m13.543s
> > >  *sub*  0m27.290s
> >
> > I was a little surprised that the recursive
> > algorithm was so much faster in Wolfgang's tests.
> >
> Why? gawk programs execute on a virtual machine with a CIS and a couple of
> stacks. A function call isn't much worse than a goto.
> It is mainly the number of interpreter instructions that counts (e.g., h =
> h h x; is better than h = h h; h = h x;)
>
> function neil(n, s,      l, s0l){
> >   l=1;
> >   s0l=length(s);
> >   while (l*2<=n) {
> >     l=l+l;
> >     s=s s;
> >     };
> >   if (l<n) s=s substr(s, 1, (n-l)*s0l);
> >   return s;
> >   };
> >
> > You need to add
>     if( n == 0 ) return "";
> as the first instruction in neil. You can try to optimize one 2+l from the
> loop. But I still get results where your non-recursive function is somewhat
> slower than my recursive one.
>
> I have noticed that gawk 5.1.0 appears to execute both versions a little
> faster than 5.0.1, but the difference remains. I can send you all the
> details about my environment but I don't think that this would tell you
> anything noteworthy.
>
> (I have never been looking at gawk internals before, so all of the
> statements below are quite unreliable.)
>
> A somewhat enlightening procedure is to read a dump of the interpreter
> code. *rec *results in 32 VM instructions whereas *neil *results in 49. The
> salient numbers are the number of instructions executed for each
> iteration. *rec
> *loops over 22 or 24 VM instructions, the while in *neil *loops over 14
> instructions; out of the remainder of 35 instructions 26 are executed with
> each call. Iteration count in *rec *is one less for 2^(n-1)+1 to 2^n-1.
>
> Some instructions are remarkably "heavy", length() for a string is one.
>
> Timing the single call
>     x = srep( 300000000, "abc" );
> also shows that *rec *is faster.
>
> /usr/bin/time is very unreliable. I have done all runs on a machine where a
> browser but no other program (except demons) is running. I don't know what
> emacs or eclipse do when they are just sitting there, enjoying their idle
> time. But if they affect the results of /usr/bin/time, it should not be
> partisan.
>
> All of this doesn't really explain why the results differ in this
> irrational way.
>
> Cheers
> Wolfgang
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 20 Jul 2021 09:42:53 -0400
> From: "Andrew J. Schorr" <aschorr@telemetry-investments.com>
> To: Wolfgang Laun <wolfgang.laun@gmail.com>
> Cc: "Neil R. Ormos" <ormos-gnulists17@ormos.org>, Help Gawk List
>         <help-gawk@gnu.org>
> Subject: Re: How to Generate a Long String of the Same Character
> Message-ID: <20210720134253.GA7400@ti129.telemetry-investments.com>
> Content-Type: text/plain; charset=us-ascii
>
> On Tue, Jul 20, 2021 at 11:49:05AM +0200, Wolfgang Laun wrote:
> > Some instructions are remarkably "heavy", length() for a string is one.
>
> Are you in a multi-byte locale? Because if not, I'd expect length()
> to be very quick.
>
> Regards,
> Andy
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Help-gawk mailing list
> Help-gawk@gnu.org
> https://lists.gnu.org/mailman/listinfo/help-gawk
>
>
> ------------------------------
>
> End of Help-gawk Digest, Vol 1, Issue 5
> ***************************************
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]