[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely la
From: |
arnold |
Subject: |
Re: Slowness in bignum mode ( gmp | gawk -M ) when doubling extremely large inputs |
Date: |
Mon, 25 Apr 2022 01:32:19 -0600 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
Hello.
If you wish, you may discuss these issues directly with the GMP team.
I have neither the skill nor the time nor the desire to try to add
special case handling of this nature to gawk.
In general, I suspect that gawk is entirely the wrong tool for these
kinds of calculations.
Arnold
"Jason C. Kwan" via "Bug reports only for gawk." <bug-gawk@gnu.org> wrote:
> Speaking of unnecessary conversions, I have these findings for an extreme
> edge case task :
> - trying to square [ 7 x 10^3^17 ] ,
> as in 7 followed by 129,140,163 zeros (0) behind it in base-10 decimal
> representation
> Method 1 : directly spelling out the formula as is, letting gawk +
> gmp select its most optimized internal representation of it
> Method 2 : using sprintf( ) to create an ASCII string
> representaton of the big integer, then squaring it
> Method 3 : recognizing the fact that this value in question only has 1
> significant digit in decimal, and be able to significantly
> reducing the computational magnitude by only having to square
> digits that are significant in decimal i.e. 7 x 7 = 49
> while simply double-padding the trailing zeros to account
> for the rest of it.
> Using identical gawk -M configurations, as shown in full code below,the speed
> differences are drastic -
> — 98.98 secs method-1— 106.49 secs method-2— 1.54 secs method-3
> Obviously this is an extreme edge case, but I've also noticed gawk/gmp, in
> more than a handful of situations,
> "attempts to do math"
> when none is called for.
> Unfortunately, there's a sizable chorus in the community who possess
> immutable notions that string-ops are horrifically expensive across the
> board, and should be avoided at all costs.
> This code snippet showcased how the exact same gawk configuration -
> - same C locale,- same GMP flag,- same byte-mode flag (even though that's
> slightly redundant)
> can see major speed ups when routing certain tasks via string-ops instead of
> strictly, and potentially unnecessarily, adhering to GMP limbs.
> You've mentioned high big-O() penalties with roundtrip conversion - this
> trailing-zero scenario is exactly where nearly all the penaltiescould be
> avoided, and simply be limited by the speed of substr() or array splitting
> instead.
> On top of 2x before, and this squaring case, modulo-3 is another :
> using that exact same 7 E 10^3^17,
> the built-in modulo ( % ) operator took
> 8.507 secs
> for a task that only requires
> 0.356 secs
> As you can see, the 2-lines of code for mod3() function isn't doing anything
> fancy at all other than leveraging what was taught waaaay back in grade
> school :
> - that counting digits alone already suffice for mod-%-3
> - and that the exact same set decimal mod-%-3 rules are equally
> applicable to %3 against inputs in bases 4, 7, 10, 13, 16(hex), 19, ….)
> it stays in the realm of string-ops for as long as it could before finally
> having to do 1-single modulo operation against an integer likely less than
> 2^53, material and meaningful time savings could be realized.
> My own version of that mod3() function uses approx 75 digits as the cut-off
> between whether it's more advantages to do it numerically versus
> gsub-stitutionally. Your mileage may vary.
>
>
> out9: 302 B 0:00:00 [ 904 B/s] [ 904 B/s] [<=> ](
> LC_ALL=C gawk -p- -Mbe ; ) 0.31s user 0.03s system 97% cpu 0.356 total 1
> 1 2 # gawk profile, created Mon Apr 25 01:26:53 2022 3 4 # BEGIN
> rule(s) 5 6 BEGIN { 7 1 print mod3(sprintf("7%0*.f", 3 ^
> 17, 0)) 8 } 9 10 11 # Functions, listed alphabetically 12
> 13 1 function mod3(_) 14 { 15 1 gsub("[0369CFXcfx]+", "",
> _)
> 16 1 return (length(_) + (gsub("[258BEbe]", "", _))) % 3 17 }
> out9: 243 B 0:00:08 [28.6 B/s] [28.6 B/s] [<=> ](
> LC_ALL=C gawk -p- -Mbe ; ) 8.14s user 0.35s system 99% cpu 8.507 total 1
> 1 2 # gawk profile, created Mon Apr 25 01:27:03 2022 3 4 # BEGIN
> rule(s) 5 6 BEGIN { 7 1 print
> mod3_builtin(sprintf("7%0*.f", 3 ^ 17, 0)) 8 } 9 10 11 #
> Functions, listed alphabetically 12 13 1 function mod3_builtin(_)
> 14 { 15 1 return (_ % 3)
> 16 }
>
>
> You can run all that code without [ gawk -M ] flag, and the output would be
> the same.
> up to u whether you deem there's any merit for me to speak with gmp team on
> this.
> j
>
>
>
>
> echo ; sleep 2;
> ( time ( LC_ALL=C gawk -Mbe 'BEGIN { print ( 7 * 10^3^17 ) ^ 2 }' ) | pvE9 )|
> xxh128sum | lgp3
>
> out9: 246MiB 0:01:38 [2.49MiB/s] [2.49MiB/s] [<=> ]
> ( LC_ALL=C gawk -Mbe 'BEGIN { print ( 7 * 10^3^17 ) ^ 2 }'; ) 95.19s user
> 3.73s system 99% cpu 1:38.98 total
> e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin
> echo ; sleep 2;
> ( time ( LC_ALL=C gawk -Mbe 'BEGIN{ print sprintf("7%0*.f", 3^17, 0)^2 }' ) |
> pvE9 )| xxh128sum | lgp3 ; echo ;
>
> out9: 246MiB 0:01:46 [2.31MiB/s] [2.31MiB/s] [<=> ]
> ( LC_ALL=C gawk -Mbe 'BEGIN{ print sprintf("7%0*.f", 3^17, 0)^2 }'; ) 102.37s
> user 4.06s system 99% cpu 1:46.49 total
> e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin
>
> echo; ( time ( LC_ALL=C gawk -Mbe '
> BEGIN { print _x2(sprintf("7%0*.f", 3 ^ 17, 0))}
> function _x2(_, __, ___, ____, _____, ______){ if (substr(_, (__ += __ = __ ~
> __) ^ ++__, __) == "") { return (_ * _) } else if (sub("[0]+000$", "=&", _))
> { return (___ = index(_, "=")) < (++__ + __--) ? int(+(substr(_, -__ < +__,
> --___) ^ --__)) (_ = substr(_, ___ + __)) _ : _x2(______[(! +substr(_ =
> ______[split(_, ______, "[=]+")], -__ < +__, __))]) (_) _ } return int(_ * _)}
> ' ) | pvE9 )| xxh128sum | lgp3
>
> out9: 246MiB 0:00:01 [ 161MiB/s] [ 161MiB/s] [<=> ]
> ( LC_ALL=C gawk -Mbe ; ) 1.40s user 0.10s system 97% cpu 1.543 total
> e2ecfaa1bf7ddc743d5f25c89a2a59a1 stdin
>
>
>
> On Thursday, January 27, 2022, 03:56:46 PM EST, Wolfgang Laun
> <wolfgang.laun@gmail.com> wrote:
>
>
>
>
>
> Function _2x_ implements a special case of the addition of two
> multiple-precision numbers. If some application (frequently) needs doubling
> of numbers with millions of digits it might be useful. (In almost 60 years of
> programming in a wide range of application areas I've never come across this
> particular task.)
>
> It is not surprising that some specific operations can be implemented
> significantly faster by specific code, faster than the same algorithm using
> the GnuMP implementation used in gawk. print $0+$0 requires two, maybe three,
> conversions between string and MP binary, each at the cost of O(n) where n is
> the number of digits. So this requires 3O(n) or 4O(n), due to another O(n)
> for the addition itself. Doing the duplication on a string of digits just
> requires O(n). Expect a reduction to 33% or even 25%, my ballpark figure.
>
> Wolfgang
>
>
>
> On Thu, 27 Jan 2022 at 18:41, Jason C. Kwan <jasonckwan@yahoo.com> wrote:
> > I’ll go modify the code for your convenience , but I didn’t make the code
> > that way to give u a hard time - I write code like that for my personal use
> > too.
> >
> > I don’t know how to make that function more concise and still process
> > doubling 1-billion-digit integers with same precision as gmp.
> >
> > did you think I had Fun and joy taking time out to hash out a fully working
> > proof of concept to illustrate my point about the performance difference ,
> > in hopes of starting a candid discussion instead of just saying “it’s slow”
> > and sound like generic whining ?
> >
> > the proof of concept function is designed to be ran *outside* of gmp mode -
> >
> > You want gmp gawk code it’s
> >
> > gawk -Mbe ‘{ print $0+$0 }’
> >
> > that’s it - the point of proof of concept isn’t to illustrate the
> > performance of gmp itself - it serves as simply an example of how it’s
> > feasible to use base gawk without gmp bignum mode, perform certain tasks
> > with same accuracy , at much faster speeds.
> >
> > I wasn’t saying “incorporate my code” then giving you that - it’s to raise
> > awareness of how come it’s even feasible , with all the overhead associated
> > with interpreted scripting, to perform the same task significantly faster
> > than natively built in ones. Interpreted scripting, by definition, should
> > be constrained by the performance of natively built in features. I’m not
> > talking about a 5% 10% margin of error thing - I wouldn’t even say anything
> > and risk being labeled as a time waster if it’s a small double digit gain.
> >
> > It’s somewhat paradoxical to have the scripts go 275-400% speed of the
> > underlying binary.
> >
> > I perfectly believe both gawk and gmp teams are giving us state of the art
> > software, that’s why I’m guessing it maybe a I/O issue between gawk and gmp
> > that might be unnecessarily slowing things down.
> >
> > but don’t worry about the next time - if I see something, I won’t waste
> > your precious time by saying something.
> >
> > Regards,
> > Jason K
> >
> >> Andrew J. Schorr <aschorr@telemetry-investments.com>於2022年1月27日 08:24寫道:
> >>
> >> Hi,
> >>
> >> As Arnold subtly pointed out, this is a bug reporting list, not a forum for
> >> discussing obfuscated code challenges. If you believe there's a gawk
> >> performance problem, please provide a short, comprehensible program that
> >> demonstrates the issue.
> >>
> >> Thanks,
> >> Andy
> >>
> >>> On Thu, Jan 27, 2022 at 08:13:58AM -0500, Jason C. Kwan wrote:
> >>> Have you tried even just straight up pasting it and run it as is ? when
> >>> interpreted scripting for the same platform is generating some 65%
> >>> reduction in execution time versus the built in compiled binary from C
> >>> code source, don’t you think it should at least warrant a quick look to
> >>> see if there are potential I/O bottlenecks between gawk and the GMP
> >>> add-on module it uses ?
> >>>
> >>> i already used LC_ALL=C locale, per the bug reporting page, as you could
> >>> see below. I even let gawk -M go second so if there were any gains from
> >>> caching, it would be gawk -M that benefited from it. Maybe it’s an ARM
> >>> thing, being that I’m on m1max instead of x64
> >>>
> >>> I only sent here first because I’m somewhat certain the GMP side would
> >>> instantly send me back here, seeing that I couldn’t isolate out whether
> >>> the potential bottlenecks exists on gawk side or gmp side, and I haven’t
> >>> written in C since 2002. I even tried reading the gawk 5.1.1 source code
> >>> to see if I could identify anything that could be help, but haven’t been
> >>> lucky.
> >>>
> >>> Do u want me to read through the gmp source first ?
> >>>
> >>> Regards,
> >>> Jason K
> >>>
> >>>> arnold@skeeve.com於2022年1月27日 01:42寫道:
> >>>>
> >>>> Hello.
> >>>>
> >>>> Thank you for taking the time to report an issue.
> >>>>
> >>>> Please read the instructions for bug reporting at
> >>>> https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
> >>>> It was updated recently, please reread it if you haven't looked at
> >>>> it in a long time.
> >>>>
> >>>> I'm afraid that your "proof of concept" code is so unreadable that
> >>>> I won't even try to determine what it's doing.
> >>>>
> >>>> With respect to GMP performance, I suggest that you report an
> >>>> issue directly to the GMP developers. The gawk code that uses
> >>>> GMP isn't going to change. For this reason, I don't think you
> >>>> need to bother redoing your example code, unless you want to send
> >>>> it to the GMP developers.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Arnold
> >>>>
> >>>> "Jason C. Kwan" via "Bug reports only for gawk." <bug-gawk@gnu.org>
> >>>> wrote:
> >>>>
> >>>>> Hi GAWK team,
> >>>>> Not sure if I should be reporting this to the gawk team or the GnuMP
> >>>>> team - it's neither a bug or nor new feature per se, but merely a
> >>>>> performance one - I've noticed on that for extremely large inputs, say,
> >>>>> integers with more than 5 million digits, the bignum gawk -M mode could
> >>>>> be somewhat slow, with possible room for speed improvement
> >>>>> (proof-of-concept attached below). my gawk version information :
> >>>>> GNU Awk 5.1.1, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)
> >>>>> Darwin MacBook-Pro.local 21.2.0 Darwin Kernel Version 21.2.0: Sun Nov
> >>>>> 28 20:28:41 PST 2021; root:xnu-8019.61.5~1/RELEASE_ARM64_T6000 arm64
> >>>>> On my test-case of 71.5 million digits, it's possible to to save 63.7%
> >>>>> of time , even in regular gawk, when compared to gawk -M :
> >>>>> ==================
> >>>>> gawk -be '{ print length($0) }' test.txt71591842
> >>>>> test command ::
> >>>>> echo; time ( pv -q < test.txt | LC_ALL=C gawkmx -b -e '{ print _2x_($0)
> >>>>> }' ) | xxh128sum ; echo; time (pv -q < test.txt | LC_ALL=C gawk -M -b
> >>>>> -e '{ print $0+$0 }' ) | xxh128sum ; echo
> >>>>> a56c2d2302d9ea8751f810b848e6f354 stdin( pv -q < test.txt | LC_ALL=C
> >>>>> LC_ALL=C gawk -e "${mfx}" -b) 8.16s user 0.60s system 99% cpu 8.789
> >>>>> totalxxh128sum 0.01s user 0.00s system 0% cpu 8.789 total
> >>>>> a56c2d2302d9ea8751f810b848e6f354 stdin( pv -q < test.txt | LC_ALL=C
> >>>>> gawk -M -b -e ; ) 23.26s user 0.96s system 99% cpu 24.240
> >>>>> totalxxh128sum 0.01s user 0.00s system 0% cpu 24.239 total
> >>>>> =====================
> >>>>> In another test case of slightly over 275 milion digits, the time
> >>>>> savings are 71.7% :
> >>>>> fc2231bdff375b7870586d8dffc0841c stdin( pv -q <
> >>>>> jwengowengonoewgnwoegn.txt | LC_ALL=C gawk -b -e "${mfx}" -e ; )
> >>>>> 27.25s user 6.48s system 98% cpu 34.350 totalxxh128sum 0.04s user
> >>>>> 0.02s system 0% cpu 34.349 total
> >>>>> fc2231bdff375b7870586d8dffc0841c stdin( pv -q <
> >>>>> jwengowengonoewgnwoegn.txt | LC_ALL=C gawk -M -b -e ; ) 116.58s user
> >>>>> 4.78s system 99% cpu 2:01.42 totalxxh128sum 0.04s user 0.02s system 0%
> >>>>> cpu 2:01.42 total
> >>>>> ====================
> >>>>> Attached below is the full proof-of-concept code for function _2x_( )
> >>>>> to demostrate that the time savings are very much concrete and
> >>>>> possible, not simply theoretical. The test file, being 70MB+, is a big
> >>>>> large for email, but basically any file using ASCII digits 0-9 to
> >>>>> represent any integer over 5 million digits will do. The speed
> >>>>> difference isn't noticeable for smaller inputs, and for inputs fewer
> >>>>> than 7 digits, most likely it would be slower than gawk -M.
> >>>>> I tried maximizing portability of the proof-of-concept function by
> >>>>> refraining from any gawk-specific extensions of awk - this same code
> >>>>> has also been tested in mawk 1.3.4, mawk 1.9.9.6, and macos 12.1
> >>>>> awk/nawk.
> >>>>> It's entirely self-contained, performs no recursion, has no external
> >>>>> dependencies, no bit-wise ops, and doesn't include any advanced/fancy
> >>>>> math - just straight up grade-school long-form addition, doubling them
> >>>>> 15-digits per chunk, and 2 chunks per while loop. The carrying part is
> >>>>> performed by gsub( ) prior to the while( ) loop, and thus, eliminating
> >>>>> the need to track them along the way. Performance scales linearly at
> >>>>> the log-base-10 level.
> >>>>> ( I didn't include any copyright notice or credits to a priori since I
> >>>>> don't think grade school addition is something copyrightable)
> >>>>> Obviously this is simply awk scripting code and can't be directly
> >>>>> incorporated into the C code base - I'm merely raising awareness of the
> >>>>> issue.
> >>>>> Thanks for your time.
> >>>>> Jason
> >>>>> ===================================================================================
> >>>>> ( I can reformat the code for readability if you prefer ) :
> >>>>> function _2x_(__,_,_______,____,_________,
> >>>>> ___________,__________,____________,
> >>>>> ________,_____,______,___) { if(__!~/[1-9]/) { return +_ }
> >>>>> ___=(__~/^[-]/) sub("^[+-]?["(-"")"]*","",__) if
> >>>>> (length(__)<(_____=((_+=++_)^_^_-!!_))+_) { if
> >>>>> (_________^((_____+(_~____))^(_____-_)<+__)) { return
> >>>>> (-!-"")^___*(+__+__) } } ___=substr("-",_~"",___) if
> >>>>> (__!~"[5-9]") { gsub(/4/,"8",__)-gsub(/3/,"6",__)
> >>>>> gsub(/2/,"4",__)-gsub(/1/,"2",__) return (___)__ };
> >>>>> _______=____=________="";
> >>>>> __=(_____=substr(________=(_++^_--+_)^_____,_))(__)
> >>>>> gsub(/./,".",_____) sub("("(_____)")+$","_&",__)
> >>>>> __=substr(__,index(__,"_")+!+"") ________+=+________;_=""; if
> >>>>> ((gsub(_____,"&_",__)+gsub(/[_][5-9]/,".5&",__)*(+""))%2) {
> >>>>> _=(":")(________+__+__) __=substr(__,index(__,"_")+!+"") }
> >>>>> for(_____ in ______) { } if
> >>>>> (______[_____=split(__,______,/[_]/)]=="") { delete
> >>>>> ______[_____--] };__=____~____; __________=____________*=\
> >>>>> ____________=___________=_________=32; while(__<_____) {
> >>>>> if(!--__________){
> >>>>> __________=____________;_______=(_______)_;_="";
> >>>>> if(!--___________){
> >>>>> ___________=_________;____=(____)_______;_______=""; } }
> >>>>> _=(_)(":")(+______[__++]*2+________)\
> >>>>> (":")(+______[__++]*2+________) } ____=(____)(_______)(_)\
> >>>>> (__==_____?(":")(________+2*______[_____]):"")
> >>>>> gsub(/[:]+[^:]/,"",____) sub(/^0*/,"",____); return (___)____ }
> >>>
> >
> >
> >
>
>
> --
> Wolfgang Laun
>
>