Ed,
This worked like a charm <1 minute. But we have 100s of scripts .
if would really help if we can find a root cause why this 10 minutes
versus 90 minutes.
Thanks
Haritha
*From:*Ed Morton <mortoneccc@comcast.net>
*Sent:* Tuesday, June 15, 2021 9:05 AM
*To:* Koleti, Haritha <Haritha.Koleti@pseg.com>; Eli Zaretskii
<eliz@gnu.org>; arnold@skeeve.com
*Cc:* wolfgang.laun@gmail.com; bug-gawk@gnu.org; Pereira, Ricardo
<Ricardo_D.Pereira@pseg.com>; Pirane, Marco <Marco.Pirane@pseg.com>
*Subject:* Re: [EXTERNAL] Re: Performance issues using GAWK 3.1.6
->from Win 2008 to Win 2016
****CAUTION****
****CAUTION****
****CAUTION****
This e-mail is from an *EXTERNAL *address. The actual sender is
(mortoneccc@comcast.net <mailto:mortoneccc@comcast.net>) which may be
different from the display address in the From: field. Be cautious of
clicking on links or opening attachments. Suspicious? Report it via
the Report Phishing button. On mobile phones, forward message to Cyber
Security.
David Kerns spotted a bug in that code (thanks), it should be:
BEGIN {
FS=","
while ( (getline<f2) > 0 ) {
map[$2] = $1
}
}
{
sattr = ( $2 in map ? map[$2] : "" )
printf("%s,%s,%s,%s,%s,%s,\n",$1,$2,$3,$4,$5,sattr);
}
On 6/15/2021 7:49 AM, Ed Morton wrote:
That script is enormously inefficient as it'll read the whole of
Emp_attr.csv once per line of ParentChild.csv. Try changing it to
(untested):
BEGIN {
FS=","
while ( (getline<f2) > 0 ) {
map[$2] = $1
}
}
{
sattr = ( $2 in map : map[$2] : "" )
printf("%s,%s,%s,%s,%s,%s,\n",$1,$2,$3,$4,$5,sattr);
}
and you should see a significant performance improvement (i.e.
orders of magnitude). The only potential problem would be if
Emp_attr.csv was too large to fit in memory.
Ed.
On 6/15/2021 7:31 AM, Koleti, Haritha via Bug reports and all
discussion about gawk. wrote:
Two more scripts that are used in the below script.
Emp_att.awk - I am not sending this as it is working fast.
Map_attr.awk -
BEGIN {
FS=",";
}
{
t1=$2;
t0=$1;
t2=$3;
t3=$4;
t4=$5;
sattr="";
while( (getline<f2) > 0)
{
if ($2==t1)
{
sattr=$1;
}
}
close(f2);
printf("%s,%s,%s,%s,%s,%s,\n",t0,t1,t2,t3,t4,sattr);
}
[https://www.pseg.com/images/global/email/PSEG_emailsignature_PSEGw-tag_version2.png
<https://www.pseg.com/images/global/email/PSEG_emailsignature_PSEGw-tag_version2.png>]<http://www.pseg.com>
<http://www.pseg.com>
[http://facebook.com/pseg [facebook.com]
<https://urldefense.com/v3/__http:/facebook.com/pseg__;!!ITzsDw!70afyCdBQV-3sCoMPIl_aulJbbqhdB48vXq07x4KW1GP2ym9KKu37CX3cQbxBgAOtQ$>]<http://www.facebook.com/pseg>
[facebook.com]
<https://urldefense.com/v3/__http:/www.facebook.com/pseg__;!!ITzsDw!70afyCdBQV-3sCoMPIl_aulJbbqhdB48vXq07x4KW1GP2ym9KKu37CX3cQZOceRnuA$>
[Twitter]<http://www.twitter.com/psegdelivers> [twitter.com]
<https://urldefense.com/v3/__http:/www.twitter.com/psegdelivers__;!!ITzsDw!70afyCdBQV-3sCoMPIl_aulJbbqhdB48vXq07x4KW1GP2ym9KKu37CX3cQZuBlqILQ$>
[LinkedIn]<http://www.linkedin.com/company/pseg> [linkedin.com]
<https://urldefense.com/v3/__http:/www.linkedin.com/company/pseg__;!!ITzsDw!70afyCdBQV-3sCoMPIl_aulJbbqhdB48vXq07x4KW1GP2ym9KKu37CX3cQaySLvb4g$>
[https://www.pseg.com/images/global/WP_LOGOgrey.png
<https://www.pseg.com/images/global/WP_LOGOgrey.png>]<http://energizepseg.com/> [energizepseg.com]
<https://urldefense.com/v3/__http:/energizepseg.com/__;!!ITzsDw!70afyCdBQV-3sCoMPIl_aulJbbqhdB48vXq07x4KW1GP2ym9KKu37CX3cQbAjayTvg$>
PSEGSC
-----Original Message-----
From: Koleti, Haritha
Sent: Tuesday, June 15, 2021 7:49 AM
To: 'Eli Zaretskii'<eliz@gnu.org> <mailto:eliz@gnu.org>;arnold@skeeve.com
<mailto:arnold@skeeve.com>
Cc:wolfgang.laun@gmail.com <mailto:wolfgang.laun@gmail.com>;bug-gawk@gnu.org
<mailto:bug-gawk@gnu.org>; Pereira, Ricardo<Ricardo_D.Pereira@pseg.com>
<mailto:Ricardo_D.Pereira@pseg.com>; Pirane, Marco<Marco.Pirane@pseg.com>
<mailto:Marco.Pirane@pseg.com>
Subject: RE: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->from
Win 2008 to Win 2016
Good Morning Eli,
This is pretty straight forward script that is trying to Map the data
between two files. I am waiting on permission from our security team , so my
team(Ricardo,Marco) can send you the details.
But here is the script .
@ECHO ON
SET DRIVENAME=D:
SET ROOTPATH=D:\PCM_SCRIPT\test
%DRIVENAME%
CD %ROOTPATH%
TYPE ParentChild.csv|gawk -f Emp_Attr.awk>Emp_Attr.csv ----> this
is fast.
TYPE ParentChild.csv|gawk -v f2=Emp_Attr.csv -f map_attr.awk>Map_Attr.csv
-> this is where it takes time.
complete script in old server win 2008(excel 2010) completes in 10
mins. now on new server 2016(excel 2016) takes 90 minutes.
There is NO change in the volume of data in 2 files .
Thanks
Haritha
-----Original Message-----
From: Eli Zaretskii<eliz@gnu.org> <mailto:eliz@gnu.org>
Sent: Tuesday, June 15, 2021 7:30 AM
To:arnold@skeeve.com <mailto:arnold@skeeve.com>
Cc:wolfgang.laun@gmail.com <mailto:wolfgang.laun@gmail.com>;bug-gawk@gnu.org
<mailto:bug-gawk@gnu.org>; Koleti, Haritha<Haritha.Koleti@pseg.com>
<mailto:Haritha.Koleti@pseg.com>
Subject: Re: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->from
Win 2008 to Win 2016
***CAUTION******CAUTION******CAUTION***This e-mail is from an EXTERNAL
address. The actual sender is (eliz@gnu.org <mailto:eliz@gnu.org>) which may
be different from the display address in the From: field. Be cautious of clicking on
links or opening attachments. Suspicious? Report it via the Report Phishing button.
On mobile phones, forward message to Cyber Security.
From:arnold@skeeve.com <mailto:arnold@skeeve.com>
Date: Tue, 15 Jun 2021 01:51:06 -0600
Cc:bug-gawk@gnu.org <mailto:bug-gawk@gnu.org>,Haritha.Koleti@pseg.com
<mailto:Haritha.Koleti@pseg.com>
Wolfgang Laun<wolfgang.laun@gmail.com>
<mailto:wolfgang.laun@gmail.com> wrote:
The durations 10 min and 90 min suggest to me that a lot of i/o
is
going on. I have experienced performance changes of a similar
order
of magnitude due to changes in the default i/o buffer size.
-W
This is an interesting idea. Eli, what if you supply a binary built
with the following patch?
How does this theory explain the difference between the two Windows versions?
They both use the same value of the "optimal" buffer size.
I'd rather see in the script how much I/O it really does, and take it
from there. Suppose that it turns out the script invokes other programs a lot,
or does a lot of computations: then the investigation should go in some other
direction, right?
The information contained in this e-mail, including any attachment(s),
is intended solely for use by the named addressee(s). If you are not the
intended recipient, or a person designated as responsible for delivering such
messages to the intended recipient, you are not authorized to disclose, copy,
distribute or retain this message, in whole or in part, without written
authorization from PSEG. This e-mail may contain proprietary, confidential or
privileged information. If you have received this message in error, please
notify the sender immediately. This notice is included in all e-mail messages
leaving PSEG. Thank you for your cooperation.
The information contained in this e-mail, including any attachment(s),
is intended solely for use by the named addressee(s). If you are not
the intended recipient, or a person designated as responsible for
delivering such messages to the intended recipient, you are not
authorized to disclose, copy, distribute or retain this message, in
whole or in part, without written authorization from PSEG. This e-mail
may contain proprietary, confidential or privileged information. If
you have received this message in error, please notify the sender
immediately. This notice is included in all e-mail messages leaving
PSEG. Thank you for your cooperation.