|
From: | Ken Raeburn |
Subject: | Re: compiled lisp file format (Re: Skipping unexec via a big .elc file) |
Date: | Mon, 29 May 2017 05:33:50 -0400 |
On May 28, 2017, at 08:43, Philipp Stephani <address@hidden> wrote:
That’s an interesting idea. If one of the popular serialization libraries is compatibly licensed, easy to use, and performs well, it may be better than rolling our own. It’ll need to handle data structures with circular or cross-linked references. And we have the doc string delayed-loading optimization (that currently uses #$ and #@ syntaxes); presumably we’d like to keep that optimization in some form. It would be good not to have to build all our data structures on ones generated by the tool with its own bookkeeping fields; having anything in a cons cell besides the “car” and “cdr” slots would mean a significant increase in memory use. I initially said, “follow the model of flat object file formats”, not “use ELF”; ELF is just one way of organizing the data of an object file, with years of experience behind it, which we could use wholesale or borrow some lessons from. One of the typical advantages of object file formats is that the data is grouped for efficient memory usage; some sections of a file will be mapped into the address space read-only (shared between processes), other sections read-write (possibly shared until copied on write), and others not mapped at all. For example, we might put symbol names (normally never modified but it can be done), doc strings (to be loaded later, only if needed), byte code, and other strings into their own sections, and create Lisp_String objects and such pointing to those bytes as needed. We don’t keep much in the way of source location information for Lisp code around, but if we ever change that, arguably it could go in a file section that’s not mapped or read until the debugger wants the information. The Guile project’s documentation says their use of ELF is intended to build on existing work to invent a good object file format with several desired characteristics (https://www.gnu.org/software/guile/manual/html_node/Object-File-Format.html): • Above all else, it should be very cheap to load a compiled file. • It should be possible to statically allocate constants in the file. For example, a bytevector literal in source code can be emitted directly into the object file. • The compiled file should enable maximum code and data sharing between different processes. • The compiled file should contain debugging information, such as line numbers, but that information should be separated from the code itself. It should be possible to strip debugging information if space is tight. They’re generating byte code currently, but are looking forward towards generating native code as well (instead?). Their write-up implicitly assumes that, as with “normal” object files, the idea is to mmap the data into the address space, some of it read-only and some of it automatically getting some patching up, and then using those in-memory objects directly. There’s no explicit discussion of the tradeoffs of loading a file all at once versus reading one object tree (S-_expression_) at a time from an input stream, but especially when mapping and using much of the data unmodified is feasible, I suspect the all-at-once approach is likely to be more efficient. Whether that would be true in a case like Emacs, I don’t know. They use DWARF for carrying some debug information, but so far I’m unsure what information is actually stored there. Ken |
[Prev in Thread] | Current Thread | [Next in Thread] |