bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->from Win 2008 t


From: Eli Zaretskii
Subject: Re: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->from Win 2008 to Win 2016
Date: Tue, 15 Jun 2021 20:20:21 +0300

> From: "Koleti, Haritha" <Haritha.Koleti@pseg.com>
> CC: "wolfgang.laun@gmail.com" <wolfgang.laun@gmail.com>,
>         "bug-gawk@gnu.org"
>       <bug-gawk@gnu.org>,
>         "Pereira, Ricardo" <Ricardo_D.Pereira@pseg.com>,
>         "Pirane,
>  Marco" <Marco.Pirane@pseg.com>
> Date: Tue, 15 Jun 2021 16:58:28 +0000
> 
> Ed,  these inefficient scripts worked ~10 minutes  in 2008.  Do you think to 
> address this(>90 mins on 2016)
> performance  we have to change all >100 AWK scripts?
> 
> Is there any other way that you can think of would be great.

I think we all understand that changing many scripts is quite some
work.  But trying to investigate a strange problem with scripts and
files we don't have on 2 systems to which we have no access is even
more work.

Your script reads 5,000 lines for each of the 195,000 lines of input,
which means roughly 1 billion lines all in all.  On my system, which
is Windows XP SP3, gawk 5.1.0 takes about 0.2 sec to read a 1-million
line file.  It takes 2 sec to read a 70,000 line file while reading a
second 70,000 line file for each line of the first file (for the grand
total of 5 billion lines).  So it is a mystery for me how come your
script takes 10 min, let alone 90 min, to read that data and perform
some trivial assignments.

If we knew what could be the reasons for this strange regression, we'd
tell you long ago.  But we don't.  So some people offered some
potential reasons and the ways of testing those reasons, others
suggested ways to make your scripts run much faster.  This is what we
can tell given the limited information we have about 2 systems we
cannot access.

So you now have several proposals for how to deal with your problem.
You need to decide which one(s) are the best one(s) for you, given
your resources and the time you are willing to invest into this issue.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]