emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Skipping unexec via a big .elc file


From: Daniel Colascione
Subject: Re: Skipping unexec via a big .elc file
Date: Mon, 24 Oct 2016 12:47:56 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux)

Philipp Stephani <address@hidden> writes:

> Daniel Colascione <address@hidden> schrieb am Mo., 24. Okt. 2016 um 18:35 Uhr:
>
>  That is, we *could* get into a situation where "no people on board []
>  know enough about unexec to solve the next problem"
>
> I'd argue that we are already in this situation.  For example, nobody
> knows how to make unexec work with ASLR or PIE; when I tried fuzzing
> Emacs with AFL, the dumped binary would simply crash; the dumped
> binary is not reproducible (i.e. bit-by-bit identical after every
> build); and I think dumping also doesn't work with ASan. The fraction
> of situation where unexec doesn't work any more gets larger and
> larger. If we had people who could solve these problems, it should get
> smaller instead.

It's not a matter of "not knowing" how to make unexec work with PIE and
PIC code generally --- the problem is that the naive approach currently
used for serializing program state depends on the process address state
being reproducible: we don't specially mark pointers in the saved image,
so we can't relocate them. There have been numerous discussions on
emacs-devel about relocation schemes, with proposals ranging from just
making elc faster to translating elisp to C.

Everyone who's seriously thought about the unexec problem _understands_
the issue. unexec isn't black magic. Getting rid of the current scheme
is a matter of finding the right relocation scheme (which for all I know
might as well be "make elc better") and finding the time to
implement it.

My preferred approach is the portable dumper one: basically what we're
doing today, except that instead of just blindly copying the data
segment and heap to a new emacs binary, we'll write this information to
a separate file, stored in a portable format, a file that we'll keep
alongside the Emacs binary.  We'll store in this file metadata about
where the pointers are. (There are two kinds of pointers in this file:
pointers to other parts of the file and pointers to the Emacs binary.)

At startup, we'll load the dump file and walk the relocations, fixing up
all the embedded addresses to account for the new process's different
address space.  There's no binary other than the one that the compiler
generates; this data file is just data, so ASLR, ASAN, and other clever
things should work fine. (Some people have proposed asking the system
dynamic linker to do the relocating, but I'd prefer to do it ourselves,
in a portable way.)

We can't save all of the Emacs data segment this way, but we can
relocate and restore anything that's marked with staticpro. The overall
experience should be very similar to what we have today.

Additionally, the purespace concept remains useful: if we take pure
storage and put it in its own region of the dump file, we don't need to
take copy-on-write faults for data that cannot contain pointers.

Speaking of COW faults: a refinement of this scheme is to do the
relocations lazily, in a SIGSEGV handler.  (Map the dump file PROT_NONE
so any access traps.)  In the SIGSEGV handler, we can relocate just the
page we faulted, then continue. This way, we don't need to slurp in the
entire dump file from disk just to start emacs -Q -batch: we can
demand-page!

Whether this refinement is worth the trouble is something only
experimentation can tell, but it's an option if we need it.  With this
refinement, the portable dumping approach should be safe, semantically
familiar to unexec, ASLR-compatible, _and_ very nearly as fast as what
we have today.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]