Re: string-for-each vs. for-each+string->list performance

guile-user

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: string-for-each vs. for-each+string->list performance

From:	Linus Björnstam
Subject:	Re: string-for-each vs. for-each+string->list performance
Date:	Sat, 13 Jun 2020 08:41:17 +0200
User-agent:	Cyrus-JMAP/3.3.0-dev0-525-ge8fa799-fm-20200609.001-ge8fa7990

Thanks for clearing that up.

I have an old implementation of large parts of srfi-13 if that would be of 
interest. I don't know how much you want to change. The licence situation of 
the reference implementation is weird iirc.

A beginning could be to replace all higher order functions since that would 
minimize the kind of performance problems discussed here. 
-- 
  Linus Björnstam

On Fri, 12 Jun 2020, at 22:13, Ludovic Courtès wrote:
> Hi,
> 
> Linus Björnstam <linus.internet@fastmail.se> skribis:
> 
> > You can cut another 15-ish % from that loop by making an inline loop, btw
> >
> > (let loop ((pos 0))
> >   (when (< pos (string-length str))
> >     ...
> >     (loop (1+ pos)))
> >
> > I have been looking at the disassembly, even for simpler cases, but I 
> > haven't been able to understand enough of it. 
> >
> > BTW: string-for-each is in the default environment, and is probably the 
> > same as the srfi-13 C implementation.
> 
> ‘string-for-each’ in C (the default) is slower than its Scheme counterpart:
> 
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (define (sfe proc str)
>                      (define len (string-length str))
>                      (let loop ((i 0))
>                        (unless (= i len)
>                          (proc (string-ref str i))
>                          (loop (+ 1 i)))))
> scheme@(guile-user)> (define str (make-string 15000000))
> scheme@(guile-user)> ,t (sfe identity str)
> ;; 0.263725s real time, 0.263722s run time.  0.000000s spent in GC.
> scheme@(guile-user)> ,t (sfe identity str)
> ;; 0.259538s real time, 0.259529s run time.  0.000000s spent in GC.
> scheme@(guile-user)> ,t (string-for-each identity str)
> ;; 0.841632s real time, 0.841624s run time.  0.000000s spent in GC.
> scheme@(guile-user)> (version)
> $2 = "3.0.2"
> --8<---------------cut here---------------end--------------->8---
> 
> In general we seem to pay a high price for leaving (calling a subr) and
> re-entering (via ‘scm_call_n’) the VM.  This is especially acute here
> because there’s almost nothing happening in C, so we keep bouncing
> between Scheme and C.
> 
> That’s another reason to start rewriting such primitives in Scheme and
> have the C functions just call out to Scheme.
> 
> If we do:
> 
>   perf record guile -c '(string-for-each identity (make-string 15000000))'
> 
> we get this profile:
> 
> --8<---------------cut here---------------start------------->8---
> Overhead  Command  Shared Object          Symbol
>   31.10%  guile    libguile-3.0.so.1.1.1  [.] vm_regular_engine
>   27.48%  guile    libguile-3.0.so.1.1.1  [.] scm_call_n
>   14.34%  guile    libguile-3.0.so.1.1.1  [.] scm_jit_enter_mcode
>    3.55%  guile    libguile-3.0.so.1.1.1  [.] scm_i_string_ref
>    3.37%  guile    libguile-3.0.so.1.1.1  [.] get_callee_vcode
>    2.34%  guile    libguile-3.0.so.1.1.1  [.] scm_call_1
>    2.31%  guile    libguile-3.0.so.1.1.1  [.] scm_string_for_each
> --8<---------------cut here---------------end--------------->8---
> 
> Indeed, we get better performance when turning off JIT:
> 
> --8<---------------cut here---------------start------------->8---
> $ GUILE_JIT_THRESHOLD=-1 time guile -c '(string-for-each identity 
> (make-string 15000000))'
> 0.47user 0.00system 0:00.47elapsed 100%CPU (0avgtext+0avgdata 
> 26396maxresident)k
> 0inputs+0outputs (0major+1583minor)pagefaults 0swaps
> $ GUILE_JIT_THRESHOLD=100 time guile -c '(string-for-each identity 
> (make-string 15000000))'
> 0.83user 0.00system 0:00.83elapsed 100%CPU (0avgtext+0avgdata 
> 26948maxresident)k
> 0inputs+0outputs (0major+1748minor)pagefaults 0swaps
> $ GUILE_JIT_THRESHOLD=0 time guile -c '(string-for-each identity 
> (make-string 15000000))'
> 0.84user 0.00system 0:00.85elapsed 100%CPU (0avgtext+0avgdata 
> 27324maxresident)k
> 0inputs+0outputs (0major+2548minor)pagefaults 0swaps
> --8<---------------cut here---------------end--------------->8---
> 
> So it seems that we just keep firing the JIT machinery on every
> ‘scm_call_n’ for no benefit.
> 
> That’s probably also the reason why ‘%after-gc-hunk’, ‘reap-pipes’, &
> co. always show high in statprof:
> 
>   https://lists.gnu.org/archive/html/guile-devel/2020-05/msg00019.html  
> 
> Thanks,
> Ludo’.
> 
> 
>

[Prev in Thread]

Current Thread

[Next in Thread]

string-for-each vs. for-each+string->list performance, Aleix Conchillo Flaqué, 2020/06/07
- Re: string-for-each vs. for-each+string->list performance, Aleix Conchillo Flaqué, 2020/06/07
- Re: string-for-each vs. for-each+string->list performance, Linus Björnstam, 2020/06/07
- Re: string-for-each vs. for-each+string->list performance, Linus Björnstam, 2020/06/07
  - Re: string-for-each vs. for-each+string->list performance, Ludovic Courtès, 2020/06/12
    - Re: string-for-each vs. for-each+string->list performance, Linus Björnstam <=

Prev by Date: Re: guile-hall error on probably every command
Next by Date: Guile-SQLite3 0.1.2 released
Previous by thread: Re: string-for-each vs. for-each+string->list performance
Next by thread: gettext uses scm_from_locale_string instead of scm_from_utf8_string
Index(es):
- Date
- Thread