bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely la


From: arnold
Subject: Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely large inputs
Date: Mon, 25 Apr 2022 01:32:19 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Hello.

If you wish, you may discuss these issues directly with the GMP team.

I have neither the skill nor the time nor the desire to try to add
special case handling of this nature to gawk.

In general, I suspect that gawk is entirely the wrong tool for these
kinds of calculations.

Arnold

"Jason C. Kwan" via "Bug reports only for gawk." <bug-gawk@gnu.org> wrote:

> Speaking of unnecessary conversions, I have these findings for an extreme 
> edge case task : 
> - trying to square  [ 7 x 10^3^17 ]  , 
> as in 7 followed by 129,140,163 zeros (0) behind it in base-10 decimal 
> representation
> Method 1 : directly spelling out the formula as is,            letting gawk + 
> gmp select its most optimized internal representation of it
> Method 2 : using sprintf( ) to create an ASCII string            
> representaton of the big integer, then squaring it
> Method 3 : recognizing the fact that this value in question only has 1        
>     significant digit in decimal, and be able to significantly            
> reducing the computational magnitude by only having           to square 
> digits that are significant in decimal            i.e. 7 x 7 = 49             
>          while simply double-padding the            trailing zeros to account 
> for the rest of it. 
> Using identical gawk -M configurations, as shown in full code below,the speed 
> differences are drastic - 
> —  98.98 secs method-1— 106.49 secs method-2—   1.54 secs method-3
> Obviously this is an extreme edge case, but I've also noticed gawk/gmp, in 
> more than a handful of situations,
>      "attempts to do math"
> when none is called for.
> Unfortunately, there's a sizable chorus in the community who possess 
> immutable notions that string-ops are horrifically expensive across the 
> board, and should be avoided at all costs. 
> This code snippet showcased how the exact same gawk configuration - 
> - same C locale,- same GMP flag,- same byte-mode flag (even though that's 
> slightly redundant)
> can see major speed ups when routing certain tasks via string-ops instead of 
> strictly, and potentially unnecessarily, adhering to GMP limbs.
> You've mentioned high big-O() penalties with roundtrip conversion - this 
> trailing-zero scenario is exactly where nearly all the penaltiescould be 
> avoided, and simply be limited by the speed of substr() or array splitting 
> instead. 
> On top of 2x before, and this squaring case, modulo-3 is another :
> using that exact same 7 E 10^3^17, 
> the built-in modulo ( % ) operator took 
>      8.507 secs 
> for a task that only requires 
>      0.356 secs
> As you can see, the 2-lines of code for mod3() function isn't doing anything 
> fancy at all other than leveraging what was taught waaaay back in grade 
> school :
>      -  that counting digits alone         already suffice for mod-%-3 
>      -  and that the exact same set decimal mod-%-3 rules are        equally 
> applicable to %3 against inputs in bases 4, 7, 10, 13, 16(hex), 19, ….)
> it stays in the realm of string-ops for as long as it could before finally 
> having to do 1-single modulo operation against an integer likely less than 
> 2^53, material and meaningful time savings could be realized.
> My own version of that mod3() function uses approx 75 digits as the cut-off 
> between whether it's more advantages to do it numerically versus 
> gsub-stitutionally. Your mileage may vary. 
>
>
>      out9:  302 B 0:00:00 [ 904 B/s] [ 904 B/s] [<=>               ]( 
> LC_ALL=C gawk -p- -Mbe ; )  0.31s user 0.03s system 97% cpu 0.356 total     1 
> 1     2 # gawk profile, created Mon Apr 25 01:26:53 2022     3     4 # BEGIN 
> rule(s)     5      6 BEGIN {     7      1   print mod3(sprintf("7%0*.f", 3 ^ 
> 17, 0))     8 }     9    10    11 # Functions, listed alphabetically    12    
> 13      1  function mod3(_)    14 {    15      1   gsub("[0369CFXcfx]+", "", 
> _)
>     16      1   return (length(_) + (gsub("[258BEbe]", "", _))) % 3    17 }
>      out9:  243 B 0:00:08 [28.6 B/s] [28.6 B/s] [<=>               ]( 
> LC_ALL=C gawk -p- -Mbe ; )  8.14s user 0.35s system 99% cpu 8.507 total     1 
> 1     2 # gawk profile, created Mon Apr 25 01:27:03 2022     3     4 # BEGIN 
> rule(s)     5     6 BEGIN {     7      1   print 
> mod3_builtin(sprintf("7%0*.f", 3 ^ 17, 0))     8 }     9    10    11 # 
> Functions, listed alphabetically    12    13      1  function mod3_builtin(_) 
>    14 {    15      1   return (_ % 3)
>     16 }
>
>
> You can run all that code without [ gawk -M ] flag, and the output would be 
> the same.
> up to u whether you deem there's any merit for me to speak with gmp team on 
> this.
> j
>
>
>
>
> echo ; sleep 2; 
> ( time ( LC_ALL=C gawk -Mbe 'BEGIN { print ( 7 * 10^3^17 ) ^ 2 }' ) | pvE9 )| 
> xxh128sum | lgp3
>
>   out9: 246MiB 0:01:38 [2.49MiB/s] [2.49MiB/s] [<=> ]
> ( LC_ALL=C gawk -Mbe 'BEGIN { print ( 7 * 10^3^17 ) ^ 2 }'; ) 95.19s user 
> 3.73s system 99% cpu 1:38.98 total
> e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin 
> echo ; sleep 2; 
> ( time ( LC_ALL=C gawk -Mbe 'BEGIN{ print sprintf("7%0*.f", 3^17, 0)^2 }' ) | 
> pvE9 )| xxh128sum | lgp3 ; echo ;
>
>   out9: 246MiB 0:01:46 [2.31MiB/s] [2.31MiB/s] [<=> ]
> ( LC_ALL=C gawk -Mbe 'BEGIN{ print sprintf("7%0*.f", 3^17, 0)^2 }'; ) 102.37s 
> user 4.06s system 99% cpu 1:46.49 total
> e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin
>
> echo; ( time ( LC_ALL=C gawk -Mbe '
> BEGIN { print _x2(sprintf("7%0*.f", 3 ^ 17, 0))}
> function _x2(_, __, ___, ____, _____, ______){ if (substr(_, (__ += __ = __ ~ 
> __) ^ ++__, __) == "") { return (_ * _) } else if (sub("[0]+000$", "=&", _)) 
> { return (___ = index(_, "=")) < (++__ + __--) ? int(+(substr(_, -__ < +__, 
> --___) ^ --__)) (_ = substr(_, ___ + __)) _ : _x2(______[(! +substr(_ = 
> ______[split(_, ______, "[=]+")], -__ < +__, __))]) (_) _ } return int(_ * _)}
>  ' ) | pvE9 )| xxh128sum | lgp3 
>
>   out9: 246MiB 0:00:01 [ 161MiB/s] [ 161MiB/s] [<=> ]
> ( LC_ALL=C gawk -Mbe ; ) 1.40s user 0.10s system 97% cpu 1.543 total
> e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin
>
>
>
> On Thursday, January 27, 2022, 03:56:46 PM EST, Wolfgang Laun 
> <wolfgang.laun@gmail.com> wrote: 
>
>
>
>
>
> Function _2x_ implements a special case of the addition of two 
> multiple-precision numbers. If some application (frequently) needs doubling 
> of numbers with millions of digits it might be useful. (In almost 60 years of 
> programming in a wide range of application areas I've never come across this 
> particular task.)
>
> It is not surprising that some specific operations can be implemented 
> significantly faster by specific code, faster than the same algorithm using 
> the GnuMP implementation used in gawk. print $0+$0 requires two, maybe three, 
> conversions between string and MP binary, each at the cost of O(n) where n is 
> the number of digits. So this requires 3O(n) or 4O(n), due to another O(n) 
> for the addition itself. Doing the duplication on a string of digits just 
> requires O(n). Expect a reduction to 33% or even 25%, my ballpark figure.
>
> Wolfgang 
>
>
>
> On Thu, 27 Jan 2022 at 18:41, Jason C. Kwan <jasonckwan@yahoo.com> wrote:
> > I’ll go modify the code for your convenience , but I didn’t make the code 
> > that way to give u a hard time - I write code like that for my personal use 
> > too. 
> > 
> > I don’t know how to make that function more concise and still process 
> > doubling 1-billion-digit integers with same precision as gmp.
> > 
> > did you think I had Fun and joy taking time out to hash out a fully working 
> > proof of concept to illustrate my point about the performance difference , 
> > in hopes of starting a candid discussion instead of just saying “it’s slow” 
> > and sound like generic whining ? 
> > 
> > the proof of concept function is designed to be ran *outside* of gmp mode -
> > 
> > You want gmp gawk code it’s 
> > 
> > gawk -Mbe ‘{ print $0+$0 }’
> > 
> > that’s it - the point of proof of concept isn’t to illustrate the 
> > performance of gmp itself - it serves as simply an example of how it’s 
> > feasible to use base gawk without gmp bignum mode, perform certain tasks 
> > with same accuracy , at much faster speeds. 
> > 
> > I wasn’t saying  “incorporate my code” then giving you that - it’s to raise 
> > awareness of how come it’s even feasible , with all the overhead associated 
> > with interpreted scripting, to perform the same task significantly faster 
> > than natively built in ones.  Interpreted scripting, by definition, should 
> > be constrained by the performance of natively built in features. I’m not 
> > talking about a 5% 10% margin of error thing - I wouldn’t even say anything 
> > and risk being labeled as a time waster if it’s a small double digit gain.
> > 
> >  It’s somewhat paradoxical to have the scripts go 275-400% speed of the 
> > underlying binary. 
> > 
> > I perfectly believe both gawk and gmp teams are giving us state of the art 
> > software, that’s why I’m guessing it maybe a I/O issue between gawk and gmp 
> > that might be unnecessarily slowing things down. 
> > 
> > but don’t worry about the next time - if I see something, I won’t waste 
> > your precious time by saying something. 
> > 
> > Regards,
> > Jason K
> > 
> >> Andrew J. Schorr <aschorr@telemetry-investments.com>於2022年1月27日 08:24寫道:
> >> 
> >> Hi,
> >> 
> >> As Arnold subtly pointed out, this is a bug reporting list, not a forum for
> >> discussing obfuscated code challenges.  If you believe there's a gawk
> >> performance problem, please provide a short, comprehensible program that
> >> demonstrates the issue.
> >> 
> >> Thanks,
> >> Andy
> >> 
> >>> On Thu, Jan 27, 2022 at 08:13:58AM -0500, Jason C. Kwan wrote:
> >>> Have you tried even just straight up pasting it and run it as is ? when 
> >>> interpreted scripting for the same platform is generating some 65% 
> >>> reduction in execution time versus the built in compiled binary from C 
> >>> code source, don’t you think it should at least warrant a quick look to 
> >>> see if there are potential I/O bottlenecks between gawk and the GMP 
> >>> add-on module it uses ?
> >>> 
> >>> i already used LC_ALL=C locale, per the bug reporting page, as you could 
> >>> see below. I even let gawk -M go second so if there were any gains from 
> >>> caching, it would be gawk -M that benefited from it. Maybe it’s an ARM 
> >>> thing, being that I’m on m1max instead of x64
> >>> 
> >>> I only sent here first because I’m somewhat certain the GMP side would 
> >>> instantly send me back here, seeing that I couldn’t isolate out whether 
> >>> the potential bottlenecks exists on gawk side or gmp side, and I haven’t 
> >>> written in C since 2002. I even tried reading the gawk 5.1.1 source code 
> >>> to see if I could identify anything that could be help, but haven’t been 
> >>> lucky.
> >>> 
> >>> Do u want me to read through the gmp source first ?
> >>> 
> >>> Regards,
> >>> Jason K
> >>> 
> >>>> arnold@skeeve.com於2022年1月27日 01:42寫道:
> >>>> 
> >>>> Hello.
> >>>> 
> >>>> Thank you for taking the time to report an issue.
> >>>> 
> >>>> Please read the instructions for bug reporting at
> >>>> https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
> >>>> It was updated recently, please reread it if you haven't looked at
> >>>> it in a long time.
> >>>> 
> >>>> I'm afraid that your "proof of concept" code is so unreadable that
> >>>> I won't even try to determine what it's doing.
> >>>> 
> >>>> With respect to GMP performance, I suggest that you report an
> >>>> issue directly to the GMP developers.  The gawk code that uses
> >>>> GMP isn't going to change.  For this reason, I don't think you
> >>>> need to bother redoing your example code, unless you want to send
> >>>> it to the GMP developers.
> >>>> 
> >>>> Thanks,
> >>>> 
> >>>> Arnold
> >>>> 
> >>>> "Jason C. Kwan" via "Bug reports only for gawk." <bug-gawk@gnu.org> 
> >>>> wrote:
> >>>> 
> >>>>> Hi GAWK team,
> >>>>> Not sure if I should be reporting this to the gawk team or the GnuMP 
> >>>>> team - it's neither a bug or nor new feature per se, but merely a 
> >>>>> performance one - I've noticed on that for extremely large inputs, say, 
> >>>>> integers with more than 5 million digits, the bignum gawk -M mode could 
> >>>>> be somewhat slow, with possible room for speed improvement 
> >>>>> (proof-of-concept attached below). my gawk version information : 
> >>>>> GNU Awk 5.1.1, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)
> >>>>> Darwin MacBook-Pro.local 21.2.0 Darwin Kernel Version 21.2.0: Sun Nov 
> >>>>> 28 20:28:41 PST 2021; root:xnu-8019.61.5~1/RELEASE_ARM64_T6000 arm64
> >>>>> On my test-case of 71.5 million digits, it's possible to to save 63.7% 
> >>>>> of time , even in regular gawk, when compared to gawk -M  :
> >>>>> ==================
> >>>>> gawk -be '{ print length($0) }' test.txt71591842
> >>>>> test command ::
> >>>>> echo; time ( pv -q < test.txt | LC_ALL=C gawkmx -b -e '{ print _2x_($0) 
> >>>>> }' ) | xxh128sum ; echo; time (pv -q < test.txt | LC_ALL=C gawk -M -b 
> >>>>> -e '{ print $0+$0 }' ) | xxh128sum ; echo
> >>>>> a56c2d2302d9ea8751f810b848e6f354  stdin( pv -q < test.txt | LC_ALL=C 
> >>>>> LC_ALL=C gawk -e "${mfx}" -b) 8.16s user 0.60s system 99% cpu 8.789 
> >>>>> totalxxh128sum  0.01s user 0.00s system 0% cpu 8.789 total
> >>>>> a56c2d2302d9ea8751f810b848e6f354  stdin( pv -q < test.txt | LC_ALL=C 
> >>>>> gawk -M -b -e ; )  23.26s user 0.96s system 99% cpu 24.240 
> >>>>> totalxxh128sum  0.01s user 0.00s system 0% cpu 24.239 total
> >>>>> =====================
> >>>>> In another test case of slightly over 275 milion digits, the time 
> >>>>> savings are 71.7% : 
> >>>>> fc2231bdff375b7870586d8dffc0841c  stdin( pv -q < 
> >>>>> jwengowengonoewgnwoegn.txt | LC_ALL=C gawk -b -e "${mfx}" -e ; )  
> >>>>> 27.25s user 6.48s system 98% cpu 34.350 totalxxh128sum  0.04s user 
> >>>>> 0.02s system 0% cpu 34.349 total
> >>>>> fc2231bdff375b7870586d8dffc0841c  stdin( pv -q < 
> >>>>> jwengowengonoewgnwoegn.txt | LC_ALL=C gawk -M -b -e ; )  116.58s user 
> >>>>> 4.78s system 99% cpu 2:01.42 totalxxh128sum  0.04s user 0.02s system 0% 
> >>>>> cpu 2:01.42 total
> >>>>> ====================
> >>>>> Attached below is the full proof-of-concept code for function _2x_( ) 
> >>>>> to demostrate that the time savings are very much concrete and 
> >>>>> possible, not simply theoretical. The test file, being 70MB+, is a big 
> >>>>> large for email, but basically any file using ASCII digits 0-9 to 
> >>>>> represent any integer over 5 million digits will do. The speed 
> >>>>> difference isn't noticeable for smaller inputs, and for inputs fewer 
> >>>>> than 7 digits, most likely it would be slower than gawk -M. 
> >>>>> I tried maximizing portability of the proof-of-concept function by 
> >>>>> refraining from any gawk-specific extensions of awk - this same code 
> >>>>> has also been tested in mawk 1.3.4, mawk 1.9.9.6, and macos 12.1 
> >>>>> awk/nawk. 
> >>>>> It's entirely self-contained, performs no recursion, has no external 
> >>>>> dependencies, no bit-wise ops, and doesn't include any advanced/fancy 
> >>>>> math - just straight up grade-school long-form addition, doubling them 
> >>>>> 15-digits per chunk, and 2 chunks per while loop. The carrying part is 
> >>>>> performed by gsub( ) prior to the while( ) loop, and thus, eliminating 
> >>>>> the need to track them along the way. Performance scales linearly at 
> >>>>> the log-base-10 level.
> >>>>> ( I didn't include any copyright notice or credits to a priori since I 
> >>>>> don't think grade school addition is something copyrightable)
> >>>>> Obviously this is simply awk scripting code and can't be directly 
> >>>>> incorporated into the C code base - I'm merely raising awareness of the 
> >>>>> issue.
> >>>>> Thanks for your time.
> >>>>> Jason
> >>>>> ===================================================================================
> >>>>> ( I can reformat the code for readability if you prefer )  : 
> >>>>> function _2x_(__,_,_______,____,_________,      
> >>>>> ___________,__________,____________,              
> >>>>> ________,_____,______,___) {    if(__!~/[1-9]/) {        return +_ }    
> >>>>> ___=(__~/^[-]/)    sub("^[+-]?["(-"")"]*","",__)    if 
> >>>>> (length(__)<(_____=((_+=++_)^_^_-!!_))+_) {        if 
> >>>>> (_________^((_____+(_~____))^(_____-_)<+__)) {            return 
> >>>>> (-!-"")^___*(+__+__)    } }    ___=substr("-",_~"",___)    if 
> >>>>> (__!~"[5-9]") {        gsub(/4/,"8",__)-gsub(/3/,"6",__)        
> >>>>> gsub(/2/,"4",__)-gsub(/1/,"2",__)        return (___)__    }; 
> >>>>> _______=____=________="";    
> >>>>> __=(_____=substr(________=(_++^_--+_)^_____,_))(__)    
> >>>>> gsub(/./,".",_____)         sub("("(_____)")+$","_&",__)    
> >>>>> __=substr(__,index(__,"_")+!+"")    ________+=+________;_="";    if 
> >>>>> ((gsub(_____,"&_",__)+gsub(/[_][5-9]/,".5&",__)*(+""))%2) {        
> >>>>> _=(":")(________+__+__)        __=substr(__,index(__,"_")+!+"")    }    
> >>>>> for(_____ in ______) { }    if 
> >>>>> (______[_____=split(__,______,/[_]/)]=="") {        delete 
> >>>>> ______[_____--] };__=____~____;    __________=____________*=\    
> >>>>> ____________=___________=_________=32;    while(__<_____) { 
> >>>>> if(!--__________){        
> >>>>> __________=____________;_______=(_______)_;_="";        
> >>>>> if(!--___________){        
> >>>>> ___________=_________;____=(____)_______;_______="";        } }        
> >>>>> _=(_)(":")(+______[__++]*2+________)\             
> >>>>> (":")(+______[__++]*2+________) }    ____=(____)(_______)(_)\         
> >>>>> (__==_____?(":")(________+2*______[_____]):"")    
> >>>>> gsub(/[:]+[^:]/,"",____)          sub(/^0*/,"",____); return (___)____ }
> >>> 
> > 
> > 
> > 
>
>
> -- 
> Wolfgang Laun
>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]