[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: parallelization of ./configure compiler test processes
From: |
Thomas Jahns |
Subject: |
Re: parallelization of ./configure compiler test processes |
Date: |
Thu, 30 Mar 2023 01:05:36 +0200 |
Hello Danny,
I spent some time thinking about improvements to autoconf configure scripts
(while waiting for builds to proceed). In my view, it is currently still easier
to seek small efficiency gains that, in sum, could still improve run-time
substantially than parallelizing the whole would be, because there is so much
often untapped potential:
* Use bash builtin shell commands to fork and especially exec less. In modern
systems with comparatively fast data paths compared to anything that affects
resource control, like changing memory mappings, dropping caches etc., syscalls
can be a substantial source of slow down.
* Use TMPDIR to prevent temporary files from hitting disk (use /dev/shm or
similar instead of /tmp)
* In the case of spack I've seen substantial improvements from running the
whole build in /dev/shm and think spack should at least mention the possibility
for systems with substantial amounts of RAM (and guess what kind of system many
sites using spack just happen to have?).
* The gcc option -pipe is similarly helpful to get build phases to start as
soon as possible.
I'm writing this, because I feel that quite a few bright minds went for the
all-or-nothing goal of successful parallelization only to end up with
something, if at all, that did not make it into general use, when smaller,
incremental improvements can be introduced with much less risk in terms of
correctness.
And, especially in the context of a package manager that has almost full
control about what factors into the build but is outside the source tree of
each package, I feel it's very useful to think about the whole machinery to be
up for improvements.
Regarding parallelization for autoconf in particular, I think autoconf could
very much benefit from having first more explicit effects of each macro, i.e.
which variables end up being set, which file will be appended to etc. To my
knowledge this is mostly well documented for the human reader, but not
programmatically available in the M4 phase at all. E.g. if the script
generation "knew" that some test macro invocations only affected confdefs.h via
some atomic write, and no macro affecting some shell variable of consequence
was in between, those tests could indeed safely be performed in parallel, as
far as I can see.
Also, there is a discussion of this particular topic on this mailing list
started by Paul Eggert on June 14, 2022, Message-ID:
<b2d57714-3519-7929-7ddf-34c4ca774f5e@cs.ucla.edu>
Kind regards,
Thomas
> On Mar 29, 2023, at 22:12 , Danny McClanahan <dmcc2@hypnicjerk.ai> wrote:
>
> Hello autoconf,
>
> I work on a cross-platform package manager named spack (https://spack.io)
> which builds lots of gnu software from source and has fantastic support for
> autotools projects. Because spack provides a shell script `cc` to wrap the
> compiler, each invocation of `cc` for feature tests executed by `./configure`
> takes a little bit longer than normal, so configuring projects that
> necessarily have a lot of feature tests takes much longer in spack
> (particularly `gettext`, which we use as a benchmark in this change:
> https://github.com/spack/spack/pull/26259). However, we can fix that
> additional overhead ourselves without any changes in autoconf, by generating
> our `cc` wrapper instead of doing any logic in the shell script. The reason I
> messaged this board is because of a separate idea that the above situation
> made me start thinking about: *parallelizing feature test executions*, in
> order to speed up `./configure`.
>
> So a few questions:
> 1. Are there any intrinsic blockers to parallelizing the generated feature
> tests that execute in an autotools `./configure` script?
> - For example, I've been told that feature tests currently write to a
> single output file, which would get clobbered if we were to naively
> parallelize the test execution, but I was hoping that each test could be made
> to write to a temp file instead if that's true.
> 2. Which codebase (autoconf, automake, m4, ?) does the work of generating the
> script that executes tests in serial, and where in that codebase does this
> occur?
> - I've been perusing clones of the autoconf and automake codebases and
> I've been unable to locate the logic that actually executes each test in
> sequence.
> 3. How should we expose the option to execute tests in parallel?
> - In order to serve the purpose of improving `./configure` invocation
> performance, we would probably want to avoid requiring an `autoreconf` (spack
> avoids executing `autoreconf` wherever possible).
> - Possibly an option `autoreconf
> --experimental-also-generate-parallel-tests`, which would enable the end user
> to execute `./configure --experimental-execute-parallel-tests`?
>
> Please feel free to link me to any existing content/discussions on this if
> I've missed them, or redirect me to another mailing list. I'm usually pretty
> good at figuring things out on my own but have been having some difficulty
> getting started here.
>
> Thanks so much,
> Danny
smime.p7s
Description: S/MIME cryptographic signature