[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: string-for-each vs. for-each+string->list performance
From: |
Linus Björnstam |
Subject: |
Re: string-for-each vs. for-each+string->list performance |
Date: |
Sat, 13 Jun 2020 08:41:17 +0200 |
User-agent: |
Cyrus-JMAP/3.3.0-dev0-525-ge8fa799-fm-20200609.001-ge8fa7990 |
Thanks for clearing that up.
I have an old implementation of large parts of srfi-13 if that would be of
interest. I don't know how much you want to change. The licence situation of
the reference implementation is weird iirc.
A beginning could be to replace all higher order functions since that would
minimize the kind of performance problems discussed here.
--
Linus Björnstam
On Fri, 12 Jun 2020, at 22:13, Ludovic Courtès wrote:
> Hi,
>
> Linus Björnstam <linus.internet@fastmail.se> skribis:
>
> > You can cut another 15-ish % from that loop by making an inline loop, btw
> >
> > (let loop ((pos 0))
> > (when (< pos (string-length str))
> > ...
> > (loop (1+ pos)))
> >
> > I have been looking at the disassembly, even for simpler cases, but I
> > haven't been able to understand enough of it.
> >
> > BTW: string-for-each is in the default environment, and is probably the
> > same as the srfi-13 C implementation.
>
> ‘string-for-each’ in C (the default) is slower than its Scheme counterpart:
>
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (define (sfe proc str)
> (define len (string-length str))
> (let loop ((i 0))
> (unless (= i len)
> (proc (string-ref str i))
> (loop (+ 1 i)))))
> scheme@(guile-user)> (define str (make-string 15000000))
> scheme@(guile-user)> ,t (sfe identity str)
> ;; 0.263725s real time, 0.263722s run time. 0.000000s spent in GC.
> scheme@(guile-user)> ,t (sfe identity str)
> ;; 0.259538s real time, 0.259529s run time. 0.000000s spent in GC.
> scheme@(guile-user)> ,t (string-for-each identity str)
> ;; 0.841632s real time, 0.841624s run time. 0.000000s spent in GC.
> scheme@(guile-user)> (version)
> $2 = "3.0.2"
> --8<---------------cut here---------------end--------------->8---
>
> In general we seem to pay a high price for leaving (calling a subr) and
> re-entering (via ‘scm_call_n’) the VM. This is especially acute here
> because there’s almost nothing happening in C, so we keep bouncing
> between Scheme and C.
>
> That’s another reason to start rewriting such primitives in Scheme and
> have the C functions just call out to Scheme.
>
> If we do:
>
> perf record guile -c '(string-for-each identity (make-string 15000000))'
>
> we get this profile:
>
> --8<---------------cut here---------------start------------->8---
> Overhead Command Shared Object Symbol
> 31.10% guile libguile-3.0.so.1.1.1 [.] vm_regular_engine
> 27.48% guile libguile-3.0.so.1.1.1 [.] scm_call_n
> 14.34% guile libguile-3.0.so.1.1.1 [.] scm_jit_enter_mcode
> 3.55% guile libguile-3.0.so.1.1.1 [.] scm_i_string_ref
> 3.37% guile libguile-3.0.so.1.1.1 [.] get_callee_vcode
> 2.34% guile libguile-3.0.so.1.1.1 [.] scm_call_1
> 2.31% guile libguile-3.0.so.1.1.1 [.] scm_string_for_each
> --8<---------------cut here---------------end--------------->8---
>
> Indeed, we get better performance when turning off JIT:
>
> --8<---------------cut here---------------start------------->8---
> $ GUILE_JIT_THRESHOLD=-1 time guile -c '(string-for-each identity
> (make-string 15000000))'
> 0.47user 0.00system 0:00.47elapsed 100%CPU (0avgtext+0avgdata
> 26396maxresident)k
> 0inputs+0outputs (0major+1583minor)pagefaults 0swaps
> $ GUILE_JIT_THRESHOLD=100 time guile -c '(string-for-each identity
> (make-string 15000000))'
> 0.83user 0.00system 0:00.83elapsed 100%CPU (0avgtext+0avgdata
> 26948maxresident)k
> 0inputs+0outputs (0major+1748minor)pagefaults 0swaps
> $ GUILE_JIT_THRESHOLD=0 time guile -c '(string-for-each identity
> (make-string 15000000))'
> 0.84user 0.00system 0:00.85elapsed 100%CPU (0avgtext+0avgdata
> 27324maxresident)k
> 0inputs+0outputs (0major+2548minor)pagefaults 0swaps
> --8<---------------cut here---------------end--------------->8---
>
> So it seems that we just keep firing the JIT machinery on every
> ‘scm_call_n’ for no benefit.
>
> That’s probably also the reason why ‘%after-gc-hunk’, ‘reap-pipes’, &
> co. always show high in statprof:
>
> https://lists.gnu.org/archive/html/guile-devel/2020-05/msg00019.html
>
> Thanks,
> Ludo’.
>
>
>