guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Limiting parallelism using futures, parallel forms and fibers


From: Zelphir Kaltstahl
Subject: Limiting parallelism using futures, parallel forms and fibers
Date: Wed, 8 Jan 2020 08:56:11 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Icedove/52.9.1

Hello Guile users!

I thought about what I need for parallelizing an algorithm I am working
on. Background: I am working on my decision tree implementation
(https://notabug.org/ZelphirKaltstahl/guile-ml/src/wip-port-to-guile),
which is currently only using a single core. Training the model splits
the tree into subtrees, which is an opportunity for running things in
parallel. Subtrees can subsequently split again and that could again
result in a parallel evaluation of the subtrees.

Since it might not always be desired to use potentially all available
cores (blocking the whole CPU), I would like to add the possibility for
the user of the algorithm to limit the number of cores used by the
algorithm, most likely using a keyword argument, which defaults to the
number of cores. Looking at futures, parallel forms and the fibers
library, I've had the following thoughts:

1. fibers support a ~#:parallelism~ keyword and thus the number of
fibers running in parallel can be set for ~run-fibers~, which creates a
scheduler, which might be used later, possibly avoiding to keep track of
how many threads are being used. It would probably be important when
using fibers, to always make use of the same scheduler, as a new
scheduler might not know anything about other schedulers and the limit
for parallelism might be overstepped. Schedulers, which are controlled
by the initial scheduler are probably OK. I will need to re-read the
fibers manual on that.

2. With parallel forms, there are ~n-par-map~ and ~n-par-for-each~,
where one can specify the maximum number of threads running in parallel.
However, when recursively calling these procedures (in my case
processing subtrees of a decision tree, which might split into subtrees
again), one does not have the knowledge of whether the processing of the
other recursive calls has finished and cannot know, how many other
threads are being used currently. Calling one of these procedures again
might run more threads than specified on a upper level call.

3. With futures, one cannot specify the number of futures to run at
maximum directly. In order to control how many threads are evaluating
code at the same time, one would have to build some kind of construct
around them, which keeps track of how many futures are running or could
be running and make that work for recursive creation of further futures
as well.

4. One could also do something ugly and create a globally defined active
thread counter, which requires locking, to keep track of the number of
in parallel running threads or futures.

5. I could just assume the maximum number of currently used cores, by
giving the tree depth as argument for recursive calls and calculating
from that, how many splits and thus evaluations might be running in
parallel at that point. However, this might be inaccurate, as some
subtree might already be finished and then I would not use the maximum
user specified number of parallel evaluations.

So my questions are:

- Is there a default / recommended way to limit parallelism for
recursive calls to parallel forms?

- Is there a better way than a global counter with locking, to limit the
number of futures created during recursive calls? I would dislike very
much to have to do something like global state + mutex.

- What do you recommend in general to solve this?

Regards,
Zelphir




reply via email to

[Prev in Thread] Current Thread [Next in Thread]