bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Is the speed of array index checking same as string compa


From: Andrew J. Schorr
Subject: Re: [bug-gawk] Is the speed of array index checking same as string comparison?
Date: Tue, 27 Nov 2018 08:46:22 -0500
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Nov 26, 2018 at 05:03:50PM -0600, Peng Yu wrote:
> The above test case has a bug --- the baseline was not correct.
> 
> Here is the correct one. It seems that there is only a constant factor
> (~40%) slow-down no matter how large the hash size is. Is it?
> 
> ns=(
>         10
>         100
>         1000
>         10000
>         100000
>         1000000
>         10000000
>         100000000
>         1000000000
> )
> m=100000000
> for n in "address@hidden"
> do
>         time1=$(( time awk -v n="$n" -v m="$m" 'BEGIN {
> for(i=1;i<=n;++i) d[i]=""; for(i=1;i<=m;++i) {} }' ) 2>&1)
>         time2=$(( time awk -v n="$n" -v m="$m" 'BEGIN {
> for(i=1;i<=n;++i) d[i]=""; d["NA"]=1; for(i=1;i<=m;++i) ("NA" in d) }'
> ) 2>&1)
>         time3=$(( time awk -v n="$n" -v m="$m" 'BEGIN {
> for(i=1;i<=n;++i) d[i]=""; x="NA"; for(i=1;i<=m;++i) x=="NA" }' )
> 2>&1)
>         printf '%s\t%s\t%s\t%s\t%s\n' "$n" "$time1" "$time2" "$time3"
> "$(bc -l <<< "($time2-$time1)/($time3-$time1)")"
> done
> 10      3.695   7.461   6.350   1.41845574387947269303
> 100     3.630   7.420   6.297   1.42107236595425571803
> 1000    3.694   7.431   6.345   1.40965673330818559034
> 10000   3.686   7.424   6.344   1.40632054176072234762
> 100000  3.693   7.440   6.347   1.41183119819140919366
> 1000000 3.776   7.576   6.496   1.39705882352941176470
> 10000000        5.692   9.302   8.230   1.42237982663514578408
> 100000000       23.647  26.845  25.913  1.41129744042365401588
> 1000000000      201.134 204.302 203.471 1.35558408215661103979

There are 3 different array back-end implementations used in gawk to optimize
performance: cint (consecutive positive integer indices), int (integer
indices), and str (string indices). So your measurements are not generally
valid. In this case, I think a cint array is created, but then there's a
lookaside array for strings that are later inserted.  So I fear that you are
not measuring what you think you are measuring.

Regards,
Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]