bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Is the speed of array index checking same as string compa


From: Peng Yu
Subject: Re: [bug-gawk] Is the speed of array index checking same as string comparison?
Date: Mon, 26 Nov 2018 16:17:11 -0600

> Awk arrays are hash tables.  As they fill up, access will slow down.
> That said, gawk goes to a lot of trouble to try to keep access fast,
> with a small number of collisions per bucket on average.  I suspect you'd
> have to REALLY load an array before you'd notice a big difference
> (millions of elements on modern machines).

Here is my test case. It seems that only when the hash size reaches a
billion, it can cause more than 80% slow-down. Before that, the
slowdown is only 10%~20% which is quite acceptable in practice. Isn't
it?

ns=(
    10000
    100000
    1000000
    10000000
    100000000
    1000000000
)
m=100000000
for n in "address@hidden"
do
    time1=$(( time awk -v n="$n" 'BEGIN { for(i=1;i<=n;++i) d[i]=""}' ) 2>&1)
    time2=$(( time awk -v n="$n" -v m="$m" 'BEGIN { for(i=1;i<=n;++i)
d[i]=""; d["NA"]=1; for(i=1;i<=m;++i) ("NA" in d) }') 2>&1)
    time3=$(( time awk -v n="$n" -v m="$m" 'BEGIN { for(i=1;i<=n;++i)
d[i]=""; x="NA"; for(i=1;i<=m;++i) x=="NA" }' ) 2>&1)
    printf '%s\t%s\t%s\t%s\t%s\n' "$n" "$time1" "$time2" "$time3"
"$(bc -l <<< "($time2-$time1)/($time3-$time1)")"
done
10000    0.016    7.430    6.313    1.17738605685246942988
100000    0.045    7.404    6.354    1.16642891107941036614
1000000    0.220    7.581    6.471    1.17757158854583266677
10000000    1.938    9.298    8.170    1.18100128369704749679
100000000    19.468    26.804    25.726    1.17225950782997762863
1000000000    200.543    205.224    203.093    1.83568627450980392156

-- 
Regards,
Peng



reply via email to

[Prev in Thread] Current Thread [Next in Thread]