bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely la


From: Jason C. Kwan
Subject: Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely large inputs
Date: Mon, 25 Apr 2022 05:49:40 +0000 (UTC)

Speaking of unnecessary conversions, I have these findings for an extreme edge 
case task : 
- trying to square  [ 7 x 10^3^17 ]  , 
as in 7 followed by 129,140,163 zeros (0) behind it in base-10 decimal 
representation
Method 1 : directly spelling out the formula as is,            letting gawk + 
gmp select its most optimized internal representation of it
Method 2 : using sprintf( ) to create an ASCII string            representaton 
of the big integer, then squaring it
Method 3 : recognizing the fact that this value in question only has 1          
  significant digit in decimal, and be able to significantly            
reducing the computational magnitude by only having           to square digits 
that are significant in decimal            i.e. 7 x 7 = 49                      
while simply double-padding the            trailing zeros to account for the 
rest of it. 
Using identical gawk -M configurations, as shown in full code below,the speed 
differences are drastic - 
—  98.98 secs method-1— 106.49 secs method-2—   1.54 secs method-3
Obviously this is an extreme edge case, but I've also noticed gawk/gmp, in more 
than a handful of situations,
     "attempts to do math"
when none is called for.
Unfortunately, there's a sizable chorus in the community who possess immutable 
notions that string-ops are horrifically expensive across the board, and should 
be avoided at all costs. 
This code snippet showcased how the exact same gawk configuration - 
- same C locale,- same GMP flag,- same byte-mode flag (even though that's 
slightly redundant)
can see major speed ups when routing certain tasks via string-ops instead of 
strictly, and potentially unnecessarily, adhering to GMP limbs.
You've mentioned high big-O() penalties with roundtrip conversion - this 
trailing-zero scenario is exactly where nearly all the penaltiescould be 
avoided, and simply be limited by the speed of substr() or array splitting 
instead. 
On top of 2x before, and this squaring case, modulo-3 is another :
using that exact same 7 E 10^3^17, 
the built-in modulo ( % ) operator took 
     8.507 secs 
for a task that only requires 
     0.356 secs
As you can see, the 2-lines of code for mod3() function isn't doing anything 
fancy at all other than leveraging what was taught waaaay back in grade school :
     -  that counting digits alone         already suffice for mod-%-3 
     -  and that the exact same set decimal mod-%-3 rules are        equally 
applicable to %3 against inputs in bases 4, 7, 10, 13, 16(hex), 19, ….)
it stays in the realm of string-ops for as long as it could before finally 
having to do 1-single modulo operation against an integer likely less than 
2^53, material and meaningful time savings could be realized.
My own version of that mod3() function uses approx 75 digits as the cut-off 
between whether it's more advantages to do it numerically versus 
gsub-stitutionally. Your mileage may vary. 


     out9:  302 B 0:00:00 [ 904 B/s] [ 904 B/s] [<=>               ]( LC_ALL=C 
gawk -p- -Mbe ; )  0.31s user 0.03s system 97% cpu 0.356 total     1 1     2 # 
gawk profile, created Mon Apr 25 01:26:53 2022     3     4 # BEGIN rule(s)     
5      6 BEGIN {     7      1   print mod3(sprintf("7%0*.f", 3 ^ 17, 0))     8 
}     9    10    11 # Functions, listed alphabetically    12    13      1  
function mod3(_)    14 {    15      1   gsub("[0369CFXcfx]+", "", _)
    16      1   return (length(_) + (gsub("[258BEbe]", "", _))) % 3    17 }
     out9:  243 B 0:00:08 [28.6 B/s] [28.6 B/s] [<=>               ]( LC_ALL=C 
gawk -p- -Mbe ; )  8.14s user 0.35s system 99% cpu 8.507 total     1 1     2 # 
gawk profile, created Mon Apr 25 01:27:03 2022     3     4 # BEGIN rule(s)     
5     6 BEGIN {     7      1   print mod3_builtin(sprintf("7%0*.f", 3 ^ 17, 0)) 
    8 }     9    10    11 # Functions, listed alphabetically    12    13      1 
 function mod3_builtin(_)    14 {    15      1   return (_ % 3)
    16 }


You can run all that code without [ gawk -M ] flag, and the output would be the 
same.
up to u whether you deem there's any merit for me to speak with gmp team on 
this.
j




echo ; sleep 2; 
( time ( LC_ALL=C gawk -Mbe 'BEGIN { print ( 7 * 10^3^17 ) ^ 2 }' ) | pvE9 )| 
xxh128sum | lgp3

  out9: 246MiB 0:01:38 [2.49MiB/s] [2.49MiB/s] [<=> ]
( LC_ALL=C gawk -Mbe 'BEGIN { print ( 7 * 10^3^17 ) ^ 2 }'; ) 95.19s user 3.73s 
system 99% cpu 1:38.98 total
e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin 
echo ; sleep 2; 
( time ( LC_ALL=C gawk -Mbe 'BEGIN{ print sprintf("7%0*.f", 3^17, 0)^2 }' ) | 
pvE9 )| xxh128sum | lgp3 ; echo ;

  out9: 246MiB 0:01:46 [2.31MiB/s] [2.31MiB/s] [<=> ]
( LC_ALL=C gawk -Mbe 'BEGIN{ print sprintf("7%0*.f", 3^17, 0)^2 }'; ) 102.37s 
user 4.06s system 99% cpu 1:46.49 total
e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin

echo; ( time ( LC_ALL=C gawk -Mbe '
BEGIN { print _x2(sprintf("7%0*.f", 3 ^ 17, 0))}
function _x2(_, __, ___, ____, _____, ______){ if (substr(_, (__ += __ = __ ~ 
__) ^ ++__, __) == "") { return (_ * _) } else if (sub("[0]+000$", "=&", _)) { 
return (___ = index(_, "=")) < (++__ + __--) ? int(+(substr(_, -__ < +__, 
--___) ^ --__)) (_ = substr(_, ___ + __)) _ : _x2(______[(! +substr(_ = 
______[split(_, ______, "[=]+")], -__ < +__, __))]) (_) _ } return int(_ * _)}
 ' ) | pvE9 )| xxh128sum | lgp3 

  out9: 246MiB 0:00:01 [ 161MiB/s] [ 161MiB/s] [<=> ]
( LC_ALL=C gawk -Mbe ; ) 1.40s user 0.10s system 97% cpu 1.543 total
e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin



On Thursday, January 27, 2022, 03:56:46 PM EST, Wolfgang Laun 
<wolfgang.laun@gmail.com> wrote: 





Function _2x_ implements a special case of the addition of two 
multiple-precision numbers. If some application (frequently) needs doubling of 
numbers with millions of digits it might be useful. (In almost 60 years of 
programming in a wide range of application areas I've never come across this 
particular task.)

It is not surprising that some specific operations can be implemented 
significantly faster by specific code, faster than the same algorithm using the 
GnuMP implementation used in gawk. print $0+$0 requires two, maybe three, 
conversions between string and MP binary, each at the cost of O(n) where n is 
the number of digits. So this requires 3O(n) or 4O(n), due to another O(n) for 
the addition itself. Doing the duplication on a string of digits just requires 
O(n). Expect a reduction to 33% or even 25%, my ballpark figure.

Wolfgang 



On Thu, 27 Jan 2022 at 18:41, Jason C. Kwan <jasonckwan@yahoo.com> wrote:
> I’ll go modify the code for your convenience , but I didn’t make the code 
> that way to give u a hard time - I write code like that for my personal use 
> too. 
> 
> I don’t know how to make that function more concise and still process 
> doubling 1-billion-digit integers with same precision as gmp.
> 
> did you think I had Fun and joy taking time out to hash out a fully working 
> proof of concept to illustrate my point about the performance difference , in 
> hopes of starting a candid discussion instead of just saying “it’s slow” and 
> sound like generic whining ? 
> 
> the proof of concept function is designed to be ran *outside* of gmp mode -
> 
> You want gmp gawk code it’s 
> 
> gawk -Mbe ‘{ print $0+$0 }’
> 
> that’s it - the point of proof of concept isn’t to illustrate the performance 
> of gmp itself - it serves as simply an example of how it’s feasible to use 
> base gawk without gmp bignum mode, perform certain tasks with same accuracy , 
> at much faster speeds. 
> 
> I wasn’t saying  “incorporate my code” then giving you that - it’s to raise 
> awareness of how come it’s even feasible , with all the overhead associated 
> with interpreted scripting, to perform the same task significantly faster 
> than natively built in ones.  Interpreted scripting, by definition, should be 
> constrained by the performance of natively built in features. I’m not talking 
> about a 5% 10% margin of error thing - I wouldn’t even say anything and risk 
> being labeled as a time waster if it’s a small double digit gain.
> 
>  It’s somewhat paradoxical to have the scripts go 275-400% speed of the 
> underlying binary. 
> 
> I perfectly believe both gawk and gmp teams are giving us state of the art 
> software, that’s why I’m guessing it maybe a I/O issue between gawk and gmp 
> that might be unnecessarily slowing things down. 
> 
> but don’t worry about the next time - if I see something, I won’t waste your 
> precious time by saying something. 
> 
> Regards,
> Jason K
> 
>> Andrew J. Schorr <aschorr@telemetry-investments.com>於2022年1月27日 08:24寫道:
>> 
>> Hi,
>> 
>> As Arnold subtly pointed out, this is a bug reporting list, not a forum for
>> discussing obfuscated code challenges.  If you believe there's a gawk
>> performance problem, please provide a short, comprehensible program that
>> demonstrates the issue.
>> 
>> Thanks,
>> Andy
>> 
>>> On Thu, Jan 27, 2022 at 08:13:58AM -0500, Jason C. Kwan wrote:
>>> Have you tried even just straight up pasting it and run it as is ? when 
>>> interpreted scripting for the same platform is generating some 65% 
>>> reduction in execution time versus the built in compiled binary from C code 
>>> source, don’t you think it should at least warrant a quick look to see if 
>>> there are potential I/O bottlenecks between gawk and the GMP add-on module 
>>> it uses ?
>>> 
>>> i already used LC_ALL=C locale, per the bug reporting page, as you could 
>>> see below. I even let gawk -M go second so if there were any gains from 
>>> caching, it would be gawk -M that benefited from it. Maybe it’s an ARM 
>>> thing, being that I’m on m1max instead of x64
>>> 
>>> I only sent here first because I’m somewhat certain the GMP side would 
>>> instantly send me back here, seeing that I couldn’t isolate out whether the 
>>> potential bottlenecks exists on gawk side or gmp side, and I haven’t 
>>> written in C since 2002. I even tried reading the gawk 5.1.1 source code to 
>>> see if I could identify anything that could be help, but haven’t been lucky.
>>> 
>>> Do u want me to read through the gmp source first ?
>>> 
>>> Regards,
>>> Jason K
>>> 
>>>> arnold@skeeve.com於2022年1月27日 01:42寫道:
>>>> 
>>>> Hello.
>>>> 
>>>> Thank you for taking the time to report an issue.
>>>> 
>>>> Please read the instructions for bug reporting at
>>>> https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
>>>> It was updated recently, please reread it if you haven't looked at
>>>> it in a long time.
>>>> 
>>>> I'm afraid that your "proof of concept" code is so unreadable that
>>>> I won't even try to determine what it's doing.
>>>> 
>>>> With respect to GMP performance, I suggest that you report an
>>>> issue directly to the GMP developers.  The gawk code that uses
>>>> GMP isn't going to change.  For this reason, I don't think you
>>>> need to bother redoing your example code, unless you want to send
>>>> it to the GMP developers.
>>>> 
>>>> Thanks,
>>>> 
>>>> Arnold
>>>> 
>>>> "Jason C. Kwan" via "Bug reports only for gawk." <bug-gawk@gnu.org> wrote:
>>>> 
>>>>> Hi GAWK team,
>>>>> Not sure if I should be reporting this to the gawk team or the GnuMP team 
>>>>> - it's neither a bug or nor new feature per se, but merely a performance 
>>>>> one - I've noticed on that for extremely large inputs, say, integers with 
>>>>> more than 5 million digits, the bignum gawk -M mode could be somewhat 
>>>>> slow, with possible room for speed improvement (proof-of-concept attached 
>>>>> below). my gawk version information : 
>>>>> GNU Awk 5.1.1, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)
>>>>> Darwin MacBook-Pro.local 21.2.0 Darwin Kernel Version 21.2.0: Sun Nov 28 
>>>>> 20:28:41 PST 2021; root:xnu-8019.61.5~1/RELEASE_ARM64_T6000 arm64
>>>>> On my test-case of 71.5 million digits, it's possible to to save 63.7% of 
>>>>> time , even in regular gawk, when compared to gawk -M  :
>>>>> ==================
>>>>> gawk -be '{ print length($0) }' test.txt71591842
>>>>> test command ::
>>>>> echo; time ( pv -q < test.txt | LC_ALL=C gawkmx -b -e '{ print _2x_($0) 
>>>>> }' ) | xxh128sum ; echo; time (pv -q < test.txt | LC_ALL=C gawk -M -b -e 
>>>>> '{ print $0+$0 }' ) | xxh128sum ; echo
>>>>> a56c2d2302d9ea8751f810b848e6f354  stdin( pv -q < test.txt | LC_ALL=C 
>>>>> LC_ALL=C gawk -e "${mfx}" -b) 8.16s user 0.60s system 99% cpu 8.789 
>>>>> totalxxh128sum  0.01s user 0.00s system 0% cpu 8.789 total
>>>>> a56c2d2302d9ea8751f810b848e6f354  stdin( pv -q < test.txt | LC_ALL=C gawk 
>>>>> -M -b -e ; )  23.26s user 0.96s system 99% cpu 24.240 totalxxh128sum  
>>>>> 0.01s user 0.00s system 0% cpu 24.239 total
>>>>> =====================
>>>>> In another test case of slightly over 275 milion digits, the time savings 
>>>>> are 71.7% : 
>>>>> fc2231bdff375b7870586d8dffc0841c  stdin( pv -q < 
>>>>> jwengowengonoewgnwoegn.txt | LC_ALL=C gawk -b -e "${mfx}" -e ; )  27.25s 
>>>>> user 6.48s system 98% cpu 34.350 totalxxh128sum  0.04s user 0.02s system 
>>>>> 0% cpu 34.349 total
>>>>> fc2231bdff375b7870586d8dffc0841c  stdin( pv -q < 
>>>>> jwengowengonoewgnwoegn.txt | LC_ALL=C gawk -M -b -e ; )  116.58s user 
>>>>> 4.78s system 99% cpu 2:01.42 totalxxh128sum  0.04s user 0.02s system 0% 
>>>>> cpu 2:01.42 total
>>>>> ====================
>>>>> Attached below is the full proof-of-concept code for function _2x_( ) to 
>>>>> demostrate that the time savings are very much concrete and possible, not 
>>>>> simply theoretical. The test file, being 70MB+, is a big large for email, 
>>>>> but basically any file using ASCII digits 0-9 to represent any integer 
>>>>> over 5 million digits will do. The speed difference isn't noticeable for 
>>>>> smaller inputs, and for inputs fewer than 7 digits, most likely it would 
>>>>> be slower than gawk -M. 
>>>>> I tried maximizing portability of the proof-of-concept function by 
>>>>> refraining from any gawk-specific extensions of awk - this same code has 
>>>>> also been tested in mawk 1.3.4, mawk 1.9.9.6, and macos 12.1 awk/nawk. 
>>>>> It's entirely self-contained, performs no recursion, has no external 
>>>>> dependencies, no bit-wise ops, and doesn't include any advanced/fancy 
>>>>> math - just straight up grade-school long-form addition, doubling them 
>>>>> 15-digits per chunk, and 2 chunks per while loop. The carrying part is 
>>>>> performed by gsub( ) prior to the while( ) loop, and thus, eliminating 
>>>>> the need to track them along the way. Performance scales linearly at the 
>>>>> log-base-10 level.
>>>>> ( I didn't include any copyright notice or credits to a priori since I 
>>>>> don't think grade school addition is something copyrightable)
>>>>> Obviously this is simply awk scripting code and can't be directly 
>>>>> incorporated into the C code base - I'm merely raising awareness of the 
>>>>> issue.
>>>>> Thanks for your time.
>>>>> Jason
>>>>> ===================================================================================
>>>>> ( I can reformat the code for readability if you prefer )  : 
>>>>> function _2x_(__,_,_______,____,_________,      
>>>>> ___________,__________,____________,              
>>>>> ________,_____,______,___) {    if(__!~/[1-9]/) {        return +_ }    
>>>>> ___=(__~/^[-]/)    sub("^[+-]?["(-"")"]*","",__)    if 
>>>>> (length(__)<(_____=((_+=++_)^_^_-!!_))+_) {        if 
>>>>> (_________^((_____+(_~____))^(_____-_)<+__)) {            return 
>>>>> (-!-"")^___*(+__+__)    } }    ___=substr("-",_~"",___)    if 
>>>>> (__!~"[5-9]") {        gsub(/4/,"8",__)-gsub(/3/,"6",__)        
>>>>> gsub(/2/,"4",__)-gsub(/1/,"2",__)        return (___)__    }; 
>>>>> _______=____=________="";    
>>>>> __=(_____=substr(________=(_++^_--+_)^_____,_))(__)    
>>>>> gsub(/./,".",_____)         sub("("(_____)")+$","_&",__)    
>>>>> __=substr(__,index(__,"_")+!+"")    ________+=+________;_="";    if 
>>>>> ((gsub(_____,"&_",__)+gsub(/[_][5-9]/,".5&",__)*(+""))%2) {        
>>>>> _=(":")(________+__+__)        __=substr(__,index(__,"_")+!+"")    }    
>>>>> for(_____ in ______) { }    if (______[_____=split(__,______,/[_]/)]=="") 
>>>>> {        delete ______[_____--] };__=____~____;    
>>>>> __________=____________*=\    ____________=___________=_________=32;    
>>>>> while(__<_____) { if(!--__________){        
>>>>> __________=____________;_______=(_______)_;_="";        
>>>>> if(!--___________){        
>>>>> ___________=_________;____=(____)_______;_______="";        } }        
>>>>> _=(_)(":")(+______[__++]*2+________)\             
>>>>> (":")(+______[__++]*2+________) }    ____=(____)(_______)(_)\         
>>>>> (__==_____?(":")(________+2*______[_____]):"")    
>>>>> gsub(/[:]+[^:]/,"",____)          sub(/^0*/,"",____); return (___)____ }
>>> 
> 
> 
> 


-- 
Wolfgang Laun



reply via email to

[Prev in Thread] Current Thread [Next in Thread]