bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely la


From: Jason C. Kwan
Subject: Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely large inputs
Date: Tue, 26 Apr 2022 00:37:23 -0400

and Arnold, 

seriously, 

 if u spend less time with ur attitude problems , u might also notice that 
linked tables under SYMTAB can easily be made out of sync with the actual 
underlying table - 

Take FUNCTAB for instance - 

Alter the value of an existing cell under SYMTAB[“FUNCTAB”], or create a new 
index:value pair, and u can easily make SYMTAB[“FUNCTAB”] and FUNCTAB itself 
out of sync.

> Jason C. Kwan <jasonckwan@yahoo.com>於2022年4月25日 17:50寫道:
> 
> I’m only talking to Wolfgang 
> 
> who the fuck wants to talk to a loser filled to the brim with a sense of 
> entitlement like you
> 
> 
>> arnold@skeeve.com於2022年4月25日 03:32寫道:
>> 
>> Hello.
>> 
>> If you wish, you may discuss these issues directly with the GMP team.
>> 
>> I have neither the skill nor the time nor the desire to try to add
>> special case handling of this nature to gawk.
>> 
>> In general, I suspect that gawk is entirely the wrong tool for these
>> kinds of calculations.
>> 
>> Arnold
>> 
>> "Jason C. Kwan" via "Bug reports only for gawk." <bug-gawk@gnu.org> wrote:
>> 
>>> Speaking of unnecessary conversions, I have these findings for an extreme 
>>> edge case task : 
>>> - trying to square  [ 7 x 10^3^17 ]  , 
>>> as in 7 followed by 129,140,163 zeros (0) behind it in base-10 decimal 
>>> representation
>>> Method 1 : directly spelling out the formula as is,            letting gawk 
>>> + gmp select its most optimized internal representation of it
>>> Method 2 : using sprintf( ) to create an ASCII string            
>>> representaton of the big integer, then squaring it
>>> Method 3 : recognizing the fact that this value in question only has 1      
>>>       significant digit in decimal, and be able to significantly            
>>> reducing the computational magnitude by only having           to square 
>>> digits that are significant in decimal            i.e. 7 x 7 = 49           
>>>            while simply double-padding the            trailing zeros to 
>>> account for the rest of it. 
>>> Using identical gawk -M configurations, as shown in full code below,the 
>>> speed differences are drastic - 
>>> —  98.98 secs method-1— 106.49 secs method-2—   1.54 secs method-3
>>> Obviously this is an extreme edge case, but I've also noticed gawk/gmp, in 
>>> more than a handful of situations,
>>>     "attempts to do math"
>>> when none is called for.
>>> Unfortunately, there's a sizable chorus in the community who possess 
>>> immutable notions that string-ops are horrifically expensive across the 
>>> board, and should be avoided at all costs. 
>>> This code snippet showcased how the exact same gawk configuration - 
>>> - same C locale,- same GMP flag,- same byte-mode flag (even though that's 
>>> slightly redundant)
>>> can see major speed ups when routing certain tasks via string-ops instead 
>>> of strictly, and potentially unnecessarily, adhering to GMP limbs.
>>> You've mentioned high big-O() penalties with roundtrip conversion - this 
>>> trailing-zero scenario is exactly where nearly all the penaltiescould be 
>>> avoided, and simply be limited by the speed of substr() or array splitting 
>>> instead. 
>>> On top of 2x before, and this squaring case, modulo-3 is another :
>>> using that exact same 7 E 10^3^17, 
>>> the built-in modulo ( % ) operator took 
>>>     8.507 secs 
>>> for a task that only requires 
>>>     0.356 secs
>>> As you can see, the 2-lines of code for mod3() function isn't doing 
>>> anything fancy at all other than leveraging what was taught waaaay back in 
>>> grade school :
>>>     -  that counting digits alone         already suffice for mod-%-3 
>>>     -  and that the exact same set decimal mod-%-3 rules are        equally 
>>> applicable to %3 against inputs in bases 4, 7, 10, 13, 16(hex), 19, ….)
>>> it stays in the realm of string-ops for as long as it could before finally 
>>> having to do 1-single modulo operation against an integer likely less than 
>>> 2^53, material and meaningful time savings could be realized.
>>> My own version of that mod3() function uses approx 75 digits as the cut-off 
>>> between whether it's more advantages to do it numerically versus 
>>> gsub-stitutionally. Your mileage may vary. 
>>> 
>>> 
>>>     out9:  302 B 0:00:00 [ 904 B/s] [ 904 B/s] [<=>               ]( 
>>> LC_ALL=C gawk -p- -Mbe ; )  0.31s user 0.03s system 97% cpu 0.356 total     
>>> 1 1     2 # gawk profile, created Mon Apr 25 01:26:53 2022     3     4 # 
>>> BEGIN rule(s)     5      6 BEGIN {     7      1   print 
>>> mod3(sprintf("7%0*.f", 3 ^ 17, 0))     8 }     9    10    11 # Functions, 
>>> listed alphabetically    12    13      1  function mod3(_)    14 {    15    
>>>   1   gsub("[0369CFXcfx]+", "", _)
>>>    16      1   return (length(_) + (gsub("[258BEbe]", "", _))) % 3    17 }
>>>     out9:  243 B 0:00:08 [28.6 B/s] [28.6 B/s] [<=>               ]( 
>>> LC_ALL=C gawk -p- -Mbe ; )  8.14s user 0.35s system 99% cpu 8.507 total     
>>> 1 1     2 # gawk profile, created Mon Apr 25 01:27:03 2022     3     4 # 
>>> BEGIN rule(s)     5     6 BEGIN {     7      1   print 
>>> mod3_builtin(sprintf("7%0*.f", 3 ^ 17, 0))     8 }     9    10    11 # 
>>> Functions, listed alphabetically    12    13      1  function 
>>> mod3_builtin(_)    14 {    15      1   return (_ % 3)
>>>    16 }
>>> 
>>> 
>>> You can run all that code without [ gawk -M ] flag, and the output would be 
>>> the same.
>>> up to u whether you deem there's any merit for me to speak with gmp team on 
>>> this.
>>> j
>>> 
>>> 
>>> 
>>> 
>>> echo ; sleep 2; 
>>> ( time ( LC_ALL=C gawk -Mbe 'BEGIN { print ( 7 * 10^3^17 ) ^ 2 }' ) | pvE9 
>>> )| xxh128sum | lgp3
>>> 
>>>  out9: 246MiB 0:01:38 [2.49MiB/s] [2.49MiB/s] [<=> ]
>>> ( LC_ALL=C gawk -Mbe 'BEGIN { print ( 7 * 10^3^17 ) ^ 2 }'; ) 95.19s user 
>>> 3.73s system 99% cpu 1:38.98 total
>>> e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin 
>>> echo ; sleep 2; 
>>> ( time ( LC_ALL=C gawk -Mbe 'BEGIN{ print sprintf("7%0*.f", 3^17, 0)^2 }' ) 
>>> | pvE9 )| xxh128sum | lgp3 ; echo ;
>>> 
>>>  out9: 246MiB 0:01:46 [2.31MiB/s] [2.31MiB/s] [<=> ]
>>> ( LC_ALL=C gawk -Mbe 'BEGIN{ print sprintf("7%0*.f", 3^17, 0)^2 }'; ) 
>>> 102.37s user 4.06s system 99% cpu 1:46.49 total
>>> e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin
>>> 
>>> echo; ( time ( LC_ALL=C gawk -Mbe '
>>> BEGIN { print _x2(sprintf("7%0*.f", 3 ^ 17, 0))}
>>> function _x2(_, __, ___, ____, _____, ______){ if (substr(_, (__ += __ = __ 
>>> ~ __) ^ ++__, __) == "") { return (_ * _) } else if (sub("[0]+000$", "=&", 
>>> _)) { return (___ = index(_, "=")) < (++__ + __--) ? int(+(substr(_, -__ < 
>>> +__, --___) ^ --__)) (_ = substr(_, ___ + __)) _ : _x2(______[(! +substr(_ 
>>> = ______[split(_, ______, "[=]+")], -__ < +__, __))]) (_) _ } return int(_ 
>>> * _)}
>>> ' ) | pvE9 )| xxh128sum | lgp3 
>>> 
>>>  out9: 246MiB 0:00:01 [ 161MiB/s] [ 161MiB/s] [<=> ]
>>> ( LC_ALL=C gawk -Mbe ; ) 1.40s user 0.10s system 97% cpu 1.543 total
>>> e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin
>>> 
>>> 
>>> 
>>>> On Thursday, January 27, 2022, 03:56:46 PM EST, Wolfgang Laun 
>>>> <wolfgang.laun@gmail.com> wrote: 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Function _2x_ implements a special case of the addition of two 
>>> multiple-precision numbers. If some application (frequently) needs doubling 
>>> of numbers with millions of digits it might be useful. (In almost 60 years 
>>> of programming in a wide range of application areas I've never come across 
>>> this particular task.)
>>> 
>>> It is not surprising that some specific operations can be implemented 
>>> significantly faster by specific code, faster than the same algorithm using 
>>> the GnuMP implementation used in gawk. print $0+$0 requires two, maybe 
>>> three, conversions between string and MP binary, each at the cost of O(n) 
>>> where n is the number of digits. So this requires 3O(n) or 4O(n), due to 
>>> another O(n) for the addition itself. Doing the duplication on a string of 
>>> digits just requires O(n). Expect a reduction to 33% or even 25%, my 
>>> ballpark figure.
>>> 
>>> Wolfgang 
>>> 
>>> 
>>> 
>>>> On Thu, 27 Jan 2022 at 18:41, Jason C. Kwan <jasonckwan@yahoo.com> wrote:
>>>> I’ll go modify the code for your convenience , but I didn’t make the code 
>>>> that way to give u a hard time - I write code like that for my personal 
>>>> use too. 
>>>> 
>>>> I don’t know how to make that function more concise and still process 
>>>> doubling 1-billion-digit integers with same precision as gmp.
>>>> 
>>>> did you think I had Fun and joy taking time out to hash out a fully 
>>>> working proof of concept to illustrate my point about the performance 
>>>> difference , in hopes of starting a candid discussion instead of just 
>>>> saying “it’s slow” and sound like generic whining ? 
>>>> 
>>>> the proof of concept function is designed to be ran *outside* of gmp mode -
>>>> 
>>>> You want gmp gawk code it’s 
>>>> 
>>>> gawk -Mbe ‘{ print $0+$0 }’
>>>> 
>>>> that’s it - the point of proof of concept isn’t to illustrate the 
>>>> performance of gmp itself - it serves as simply an example of how it’s 
>>>> feasible to use base gawk without gmp bignum mode, perform certain tasks 
>>>> with same accuracy , at much faster speeds. 
>>>> 
>>>> I wasn’t saying  “incorporate my code” then giving you that - it’s to 
>>>> raise awareness of how come it’s even feasible , with all the overhead 
>>>> associated with interpreted scripting, to perform the same task 
>>>> significantly faster than natively built in ones.  Interpreted scripting, 
>>>> by definition, should be constrained by the performance of natively built 
>>>> in features. I’m not talking about a 5% 10% margin of error thing - I 
>>>> wouldn’t even say anything and risk being labeled as a time waster if it’s 
>>>> a small double digit gain.
>>>> 
>>>> It’s somewhat paradoxical to have the scripts go 275-400% speed of the 
>>>> underlying binary. 
>>>> 
>>>> I perfectly believe both gawk and gmp teams are giving us state of the art 
>>>> software, that’s why I’m guessing it maybe a I/O issue between gawk and 
>>>> gmp that might be unnecessarily slowing things down. 
>>>> 
>>>> but don’t worry about the next time - if I see something, I won’t waste 
>>>> your precious time by saying something. 
>>>> 
>>>> Regards,
>>>> Jason K
>>>> 
>>>>> Andrew J. Schorr <aschorr@telemetry-investments.com>於2022年1月27日 08:24寫道:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> As Arnold subtly pointed out, this is a bug reporting list, not a forum 
>>>>> for
>>>>> discussing obfuscated code challenges.  If you believe there's a gawk
>>>>> performance problem, please provide a short, comprehensible program that
>>>>> demonstrates the issue.
>>>>> 
>>>>> Thanks,
>>>>> Andy
>>>>> 
>>>>>> On Thu, Jan 27, 2022 at 08:13:58AM -0500, Jason C. Kwan wrote:
>>>>>> Have you tried even just straight up pasting it and run it as is ? when 
>>>>>> interpreted scripting for the same platform is generating some 65% 
>>>>>> reduction in execution time versus the built in compiled binary from C 
>>>>>> code source, don’t you think it should at least warrant a quick look to 
>>>>>> see if there are potential I/O bottlenecks between gawk and the GMP 
>>>>>> add-on module it uses ?
>>>>>> 
>>>>>> i already used LC_ALL=C locale, per the bug reporting page, as you could 
>>>>>> see below. I even let gawk -M go second so if there were any gains from 
>>>>>> caching, it would be gawk -M that benefited from it. Maybe it’s an ARM 
>>>>>> thing, being that I’m on m1max instead of x64
>>>>>> 
>>>>>> I only sent here first because I’m somewhat certain the GMP side would 
>>>>>> instantly send me back here, seeing that I couldn’t isolate out whether 
>>>>>> the potential bottlenecks exists on gawk side or gmp side, and I haven’t 
>>>>>> written in C since 2002. I even tried reading the gawk 5.1.1 source code 
>>>>>> to see if I could identify anything that could be help, but haven’t been 
>>>>>> lucky.
>>>>>> 
>>>>>> Do u want me to read through the gmp source first ?
>>>>>> 
>>>>>> Regards,
>>>>>> Jason K
>>>>>> 
>>>>>>> arnold@skeeve.com於2022年1月27日 01:42寫道:
>>>>>>> 
>>>>>>> Hello.
>>>>>>> 
>>>>>>> Thank you for taking the time to report an issue.
>>>>>>> 
>>>>>>> Please read the instructions for bug reporting at
>>>>>>> https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
>>>>>>> It was updated recently, please reread it if you haven't looked at
>>>>>>> it in a long time.
>>>>>>> 
>>>>>>> I'm afraid that your "proof of concept" code is so unreadable that
>>>>>>> I won't even try to determine what it's doing.
>>>>>>> 
>>>>>>> With respect to GMP performance, I suggest that you report an
>>>>>>> issue directly to the GMP developers.  The gawk code that uses
>>>>>>> GMP isn't going to change.  For this reason, I don't think you
>>>>>>> need to bother redoing your example code, unless you want to send
>>>>>>> it to the GMP developers.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Arnold
>>>>>>> 
>>>>>>> "Jason C. Kwan" via "Bug reports only for gawk." <bug-gawk@gnu.org> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi GAWK team,
>>>>>>>> Not sure if I should be reporting this to the gawk team or the GnuMP 
>>>>>>>> team - it's neither a bug or nor new feature per se, but merely a 
>>>>>>>> performance one - I've noticed on that for extremely large inputs, 
>>>>>>>> say, integers with more than 5 million digits, the bignum gawk -M mode 
>>>>>>>> could be somewhat slow, with possible room for speed improvement 
>>>>>>>> (proof-of-concept attached below). my gawk version information : 
>>>>>>>> GNU Awk 5.1.1, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)
>>>>>>>> Darwin MacBook-Pro.local 21.2.0 Darwin Kernel Version 21.2.0: Sun Nov 
>>>>>>>> 28 20:28:41 PST 2021; root:xnu-8019.61.5~1/RELEASE_ARM64_T6000 arm64
>>>>>>>> On my test-case of 71.5 million digits, it's possible to to save 63.7% 
>>>>>>>> of time , even in regular gawk, when compared to gawk -M  :
>>>>>>>> ==================
>>>>>>>> gawk -be '{ print length($0) }' test.txt71591842
>>>>>>>> test command ::
>>>>>>>> echo; time ( pv -q < test.txt | LC_ALL=C gawkmx -b -e '{ print 
>>>>>>>> _2x_($0) }' ) | xxh128sum ; echo; time (pv -q < test.txt | LC_ALL=C 
>>>>>>>> gawk -M -b -e '{ print $0+$0 }' ) | xxh128sum ; echo
>>>>>>>> a56c2d2302d9ea8751f810b848e6f354  stdin( pv -q < test.txt | LC_ALL=C 
>>>>>>>> LC_ALL=C gawk -e "${mfx}" -b) 8.16s user 0.60s system 99% cpu 8.789 
>>>>>>>> totalxxh128sum  0.01s user 0.00s system 0% cpu 8.789 total
>>>>>>>> a56c2d2302d9ea8751f810b848e6f354  stdin( pv -q < test.txt | LC_ALL=C 
>>>>>>>> gawk -M -b -e ; )  23.26s user 0.96s system 99% cpu 24.240 
>>>>>>>> totalxxh128sum  0.01s user 0.00s system 0% cpu 24.239 total
>>>>>>>> =====================
>>>>>>>> In another test case of slightly over 275 milion digits, the time 
>>>>>>>> savings are 71.7% : 
>>>>>>>> fc2231bdff375b7870586d8dffc0841c  stdin( pv -q < 
>>>>>>>> jwengowengonoewgnwoegn.txt | LC_ALL=C gawk -b -e "${mfx}" -e ; )  
>>>>>>>> 27.25s user 6.48s system 98% cpu 34.350 totalxxh128sum  0.04s user 
>>>>>>>> 0.02s system 0% cpu 34.349 total
>>>>>>>> fc2231bdff375b7870586d8dffc0841c  stdin( pv -q < 
>>>>>>>> jwengowengonoewgnwoegn.txt | LC_ALL=C gawk -M -b -e ; )  116.58s user 
>>>>>>>> 4.78s system 99% cpu 2:01.42 totalxxh128sum  0.04s user 0.02s system 
>>>>>>>> 0% cpu 2:01.42 total
>>>>>>>> ====================
>>>>>>>> Attached below is the full proof-of-concept code for function _2x_( ) 
>>>>>>>> to demostrate that the time savings are very much concrete and 
>>>>>>>> possible, not simply theoretical. The test file, being 70MB+, is a big 
>>>>>>>> large for email, but basically any file using ASCII digits 0-9 to 
>>>>>>>> represent any integer over 5 million digits will do. The speed 
>>>>>>>> difference isn't noticeable for smaller inputs, and for inputs fewer 
>>>>>>>> than 7 digits, most likely it would be slower than gawk -M. 
>>>>>>>> I tried maximizing portability of the proof-of-concept function by 
>>>>>>>> refraining from any gawk-specific extensions of awk - this same code 
>>>>>>>> has also been tested in mawk 1.3.4, mawk 1.9.9.6, and macos 12.1 
>>>>>>>> awk/nawk. 
>>>>>>>> It's entirely self-contained, performs no recursion, has no external 
>>>>>>>> dependencies, no bit-wise ops, and doesn't include any advanced/fancy 
>>>>>>>> math - just straight up grade-school long-form addition, doubling them 
>>>>>>>> 15-digits per chunk, and 2 chunks per while loop. The carrying part is 
>>>>>>>> performed by gsub( ) prior to the while( ) loop, and thus, eliminating 
>>>>>>>> the need to track them along the way. Performance scales linearly at 
>>>>>>>> the log-base-10 level.
>>>>>>>> ( I didn't include any copyright notice or credits to a priori since I 
>>>>>>>> don't think grade school addition is something copyrightable)
>>>>>>>> Obviously this is simply awk scripting code and can't be directly 
>>>>>>>> incorporated into the C code base - I'm merely raising awareness of 
>>>>>>>> the issue.
>>>>>>>> Thanks for your time.
>>>>>>>> Jason
>>>>>>>> ===================================================================================
>>>>>>>> ( I can reformat the code for readability if you prefer )  : 
>>>>>>>> function _2x_(__,_,_______,____,_________,      
>>>>>>>> ___________,__________,____________,              
>>>>>>>> ________,_____,______,___) {    if(__!~/[1-9]/) {        return +_ }   
>>>>>>>>  ___=(__~/^[-]/)    sub("^[+-]?["(-"")"]*","",__)    if 
>>>>>>>> (length(__)<(_____=((_+=++_)^_^_-!!_))+_) {        if 
>>>>>>>> (_________^((_____+(_~____))^(_____-_)<+__)) {            return 
>>>>>>>> (-!-"")^___*(+__+__)    } }    ___=substr("-",_~"",___)    if 
>>>>>>>> (__!~"[5-9]") {        gsub(/4/,"8",__)-gsub(/3/,"6",__)        
>>>>>>>> gsub(/2/,"4",__)-gsub(/1/,"2",__)        return (___)__    }; 
>>>>>>>> _______=____=________="";    
>>>>>>>> __=(_____=substr(________=(_++^_--+_)^_____,_))(__)    
>>>>>>>> gsub(/./,".",_____)         sub("("(_____)")+$","_&",__)    
>>>>>>>> __=substr(__,index(__,"_")+!+"")    ________+=+________;_="";    if 
>>>>>>>> ((gsub(_____,"&_",__)+gsub(/[_][5-9]/,".5&",__)*(+""))%2) {        
>>>>>>>> _=(":")(________+__+__)        __=substr(__,index(__,"_")+!+"")    }   
>>>>>>>>  for(_____ in ______) { }    if 
>>>>>>>> (______[_____=split(__,______,/[_]/)]=="") {        delete 
>>>>>>>> ______[_____--] };__=____~____;    __________=____________*=\    
>>>>>>>> ____________=___________=_________=32;    while(__<_____) { 
>>>>>>>> if(!--__________){        
>>>>>>>> __________=____________;_______=(_______)_;_="";        
>>>>>>>> if(!--___________){        
>>>>>>>> ___________=_________;____=(____)_______;_______="";        } }        
>>>>>>>> _=(_)(":")(+______[__++]*2+________)\             
>>>>>>>> (":")(+______[__++]*2+________) }    ____=(____)(_______)(_)\         
>>>>>>>> (__==_____?(":")(________+2*______[_____]):"")    
>>>>>>>> gsub(/[:]+[^:]/,"",____)          sub(/^0*/,"",____); return (___)____ 
>>>>>>>> }
>>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> -- 
>>> Wolfgang Laun
>>> 
>>> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]