[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Speed up srfi-69 by caching current min and max length based
From: |
Mario Domenech Goulart |
Subject: |
Re: [PATCH] Speed up srfi-69 by caching current min and max length based on load and *actually resizing* |
Date: |
Sun, 21 Mar 2021 18:29:32 +0100 |
Hi,
On Thu, 18 Mar 2021 11:18:52 +0100 Peter Bex <peter@more-magic.net> wrote:
> I'm sure you've all read Ben Hoyt's blog post from earlier this week,
> https://benhoyt.com/writings/count-words/
>
> Of course (duh) I tried to implement a version in CHICKEN. So did
> Vasilij and Mario, and we all came to the conclusion that our performance
> on that simple task is abysmal, we can't even outperform Python.
>
> So I had a look and came up with one somewhat simple improvement to
> srfi-69; every time you insert something into a hash table, it checks
> if it needs to be resized. This check involves a computation which is
> needlessly heavy; it multiplies the min and max load factor with the
> current length, calls "floor" and inexact->exact on the result, just to
> compare it with the new size.
>
> The attached patch moves these calculations to when the hash table is
> created or resized, so that it can just read it out when checking for
> resize. It is a bit ugly because now we have 13 slots in the hash table
> object, and keeping track of all the indexes becomes a bit cumbersome.
> This could be changed in a refactor if necessary.
>
> Then, I noticed that the resize check doesn't actually even resize it
> when needed! The argument passed to hash-table-resize! is "len", while
> I think it should be "newsiz". I've also fixed that in the patch.
>
> This improves performance on my machine by getting the benchmark down
> from 12 seconds to 4 seconds using Vasilij's code at http://ix.io/2Ths.
> This puts us at least in the same ball park as Python, using the "simple"
> version at least (which is comparable in code to Vasilij's).
Thanks, Peter. I've applied your patch and released srfi-69 0.4.2.
I could not reproduce the same performance improvements on my system
(CHICKEN 5.2.0, GCC 8.3.0, Linux, X86-64), but I did get some
improvement: the run time of the test program by Vasilij went from 8.8s
to 7.0s on my system.
I ran salmonella over all eggs that directly depend on srfi-69 and
everything seems to look ok.
Since such operation is not trivial, I'll document here the procedure to
test a patched egg with salmonella. The challenge here is to make
salmonella (or, rather, chicken-install) pick the patched egg, not the
upstream one.
What I did, basically, was:
1. Clone eggs-5-latest
$ git clone git://code.call-cc.org/eggs-5-latest
2. Use henrietta-cache-overlay
(https://github.com/mario-goulart/henrietta-cache-overlay) to convert
the directory layout in eggs-5-latest to a directory layout that
chicken-install understands:
$ henrietta-cache-overlay $PWD/eggs-5-latest $PWD/eggs
3. Patch srfi-69 with your changes in the eggs/srfi-69 directory (which
is a symlink to eggs-5-latest/srfi-69/<latest-version>)
4. Tweak the `location' field of setup.defaults to point to the `eggs'
directory created by henrieta-cache-overlay (step 2).
5. Determine the direct dependencies of srfi-69. To do that, I sloppily
parsed the .dot file generated by salmonella-html-report. By default
our salmonella machines don't keep those files, so I just fetched a
recent salmonella log file from a salmonella machine and ran
salmonella-html-report with the `--keep-dot-files' option:
$ wget
http://salmonella-linux-x86.call-cc.org/master-debugbuild/gcc/linux/x86/2021/03/19/salmonella.log.bz2
$ bzip2 -d salmonella.log.bz2
$ salmonella-html-report -keep-dot-files salmonella.log html
$ cd rev-dep-gaphs
$ grep ' -> _srfi_69;' srfi-69.dot | awk '{print $1}' | sed 's/^_//; s/_/-/g'
json memoize spiffy-request-vars awful-static-pages glls lmdb-ht
statistics srfi-171 object-evict shen callable-data-structures
postgresql srfi-209 webview condition-utils beaker medea inotify
pyffi protobuf srfi-99 git sql-de-lite yasos dataframe moremacros
s11n awful kiwi hyde http-client sqlite3 string-utils srfi-29 utf8
edn srfi-113 srfi-19 sdl2 lowdown fmt chicken-doc http-session iup
vlist comparse pwdb minissh stty concurrent-native-callbacks
levenshtein chicken-doc-admin
Those are the 52 eggs which directly depend on srfi-69.
6. Now that we know the eggs which depend on srfi-69, we can use
salmonella with the chicken installation whose setup.defaults has
been tweaked to point to a local directory of egg sources. In my
case I used salmonella-epidemy to speed things up a little:
$ salmonella-epidemy --keep-repo --instances=$(nproc) <eggs from step 5>
Since we tweaked setup.defaults to point to the egg source directory
where srfi-69 has been patched, salmonella will build all eggs with
the patched srfi-69.
7. salmonella-epidemy (step 6) generates a salmonella.log file, which
where we can see the results and compare with the ones we get from
the salmonella machines. The comparison I did manually (just checked
what was expected to fail).
$ salmonella-log-viewer salmonella.log | grep '^==='
I actually executed that on one of our salmonella machines, just to be
sure that all the system dependencies were properly installed, so that I
could compare the results obtained with the patched srfi-69 against the
ones with the upstream egg.
All the best.
Mario
--
http://parenteses.org/mario