Top-levels, modules, optimizability

guile-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Top-levels, modules, optimizability

From:	"Taylan Ulrich Bayırlı /Kammer"
Subject:	Top-levels, modules, optimizability
Date:	Thu, 31 Oct 2013 22:52:44 +0100
I have a set of proposals for making Guile Scheme more optimizable.
My knowledge of Guile internals is lacking, but I hope these ideas
will be useful from a high-level perspective, at the very least to
ignite some thoughts.

If feedback is positive, I will try and allocate a good chunk of my
free time to actually trying to implement these, however long it takes
me; I'm aware the existing developers are already quite busy. :)


* The current situation

All top-levels, including non-exported ("private") ones, are
essentially global mutable variables, so no static optimization can be
done on references to them.

For some purposes this is very neat.  Especially during development
with Emacs/Geiser, the ability to just do C-M-x on any top-level is
invaluable in my opinion, so I desire we keep a compilation mode that
behaves as currently, even if the default behavior is changed or more
optional behaviors added.

I propose three changes/features, the first two of which give way to
*intra*-module optimizations only (i.e. on references a module makes
to its own top-levels), and the third to inter-module optimizations
(i.e. on references to module imports):

1. Immutable modules
2. Truly private top-levels
3. Static linking


* Immutable modules

A module can be compiled as an "immutable" one, meaning essentially
that `module-define!', `module-add!', `module-set!' and the like
cannot be used on it, and that variables imported from it are
immutable.  This means that static analysis of the module can tell
whether a top-level binding can ever be mutated; if not, relevant
optimizations can be done, like removing type-checks and inlining.

(I'm not sure if calling such modules "immutable" is an abuse of
terminology, since they can have mutable state; I will call them so
for simplicity.)

Such a module can still trivially export a setter-procedure for a
variable.  (Arguably one should use parameters for this anyway.)

While monkey-patching of individual bindings is made impossible for a
module compiled as such, there would be a mechanism to replace a
module at whole (e.g. with a newly compiled version), in effect
mutating all bindings it exports.

I think this is desirable as default behavior, giving a good trade-off
between optimizability and dynamicity.  The two following proposals
also essentially build upon this one.


* Truly private top-levels

If private top-levels were truly private, i.e. `module-variable',
`module-ref' and the like couldn't see them, some more optimizations
would be possible.  For instance a top-level whose every use in the
module is inlined could be removed entirely from the compiled image.

I don't know yet of any significant optimizations made possible by
this, however I think it's more intuitive anyway to have private
top-levels to be actually private, so I think this is also desirable
as default behavior.


* Static linking

There would be an option, while compiling a module, to link statically
to an imported module.  From the perspective of the immutable modules
idea, this means linking to the currently (during compilation) loaded
version of an immutable module.  This allows the (long-desired?) type
of optimization that is to inline calls to `car', `vector-ref', etc.,
or even eliminate them entirely when they're called on a constant.  On
the other hand, it means one has to recompile a module to make it see
changes in a module it had linked to statically during its previous
compilation; for this reason this should most certainly NOT be default
behavior, instead modules that want said optimizations would specify
explicitly that they want to do a "static-import" on certain modules.


* Appendix A: The implicit exports problem

It might be difficult for a static analyzer to determine whether a
variable can be mutated due to appearing in the body of any exported
macros.  (Or is it?  I'm ignorant here.)  In an initial version one
could stop trying to optimize references to any variables that appear
within macro bodies; a more sophisticated analyzer could find out
which of these occurrences give the possibility of the variable being
mutated.  (I think, other than appearing as an argument to set! (which
also applies outside macro bodies), the only such situation is when
the variable appears as an argument to an argument to the macro?)

Are procedural macros a problem here?  I think not; an analyzer just
needs to look into calls to `syntax' to see what variables from the
macro-definition scope (from the module that is being compiled) are
captured for the macro output.  `datum->syntax' necessarily gets its
`for-syntax' argument from the macro-caller scope if I'm not mistaken,
so it cannot result in bindings to the macro-definition scope.


* Appendix B: Detecting code that needs recompilation

For both static linking, and the existing problem of redefinitions of
macros not taking effect on already-compiled code, it would be useful
to automatically detect code that needs recompilation.  I think this
automatic detection would consist of three actions: recording the
linkage/usage of an object during compilation, looking up records
during recompilation of an object to see whether we're repeating a
compilation that previously linked/used an object but doesn't do so
this time, and looking up records during redefinition of an object to
see whether we're redefining an object that has had compile-time
linkers/users.  All three actions are inherently compile-time, so we
have no run-time overhead, though we have size overhead for the
records.  When compiling an immutable module, the check for each
linked/used object needs to happen once only, so compilation time
shouldn't suffer much either.

If such a system is implemented and works well, then the recompilation
of modules could perhaps be automated, making the static linking
situation transparent insofar compilation times are bearable.


* Appendix C: Red herrings, discarded thoughts

** Copying bindings on import

As an alternative to making module imports immutable, we could make
them privately owned, separate bindings, meaning that one can use
`set!' on them and have it take effect in the current scope but not
affect the binding in the module itself or other users of it
(essentially destroying the binding to the module).  The thought was
that this would be simple to implement with existing machinery, not
needing to implement immutable bindings, however this was a faulty
assumption because "what happens when the module that exported the
binding calls `set!' on it?"  In my opinion this should definitely
affect the imported bindings in users of the module, meaning that the
binding does need to be the same one, lest we implement yet another
machinery for "indirect bindings" that see changes in a pointed-to
binding and destroy the pointer/indirection when `set!' is called on
them.  I don't see merit in that complication, since overwriting a
module import by using `set!' on it and destroying the binding to the
module is not something useful anyway from what I can imagine.

** Truly private top-levels at the absence of immutable modules

One can be led to believe that truly private top-levels can be useful
on their own, at the absence of immutable modules, because "one could
at least optimize on references to the private top-levels when they
are never mutated with `set!'" ... the wrongness of this idea becomes
apparent as one realizes that a mutable module could at any time have
a procedure added to it which calls `set!' on a private top-level.

** Static linking at the absence of immutable modules

My first thought for static linking was that it would be independent
of immutable modules, and that the static linking would only happen
for bindings which don't appear as `set!' arguments, whereas other
bindings are usual dynamic links and would be affected by the
replacement of immutable modules and usage of the module API.  This is
obviously a bad idea because then those bindings will get out of sync
with the rest of whichever module was statically linked.  Everything
is simple and logical when static linking builds strictly upon
immutable modules, the linking happening to a specific immutable
version of a module.


* Further reading

ML archives:
http://lists.gnu.org/archive/html/guile-devel/2009-09/msg00023.html
http://lists.gnu.org/archive/html/guile-devel/2013-04/msg00188.html

My personal notes, more or less identic in content to this e-mail but
with different wording:
http://taylanub.github.io/doc/Guile%20Top%20Level%20and%20Module%20Optimizability.txt


Regards,
Taylan
[Prev in Thread]
Current Thread
[Next in Thread]
Top-levels, modules, optimizability, "Taylan Ulrich Bayırlı /Kammer" <=
Prev by Date: Re: [PATCH] Add procedures to convert alists into hash tables
Previous by thread: are double cells safe to use?
Index(es):
- Date
- Thread