emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Update 1 on Bytecode Offset tracking


From: Zach Shaftel
Subject: Re: Update 1 on Bytecode Offset tracking
Date: Sat, 18 Jul 2020 17:41:20 -0400
User-agent: mu4e 1.4.10; emacs 28.0.50

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> While waiting for the paperwork to go through, you can prepare the patch
>>> and we can start discussing it.
>> Sure, does that just mean the 'git format-patch -1' emailed to
>> bug-gnu-emacs@gnu.org, as mentioned in CONTRIBUTE? If that's the gist of
>> it then I can do that shortly.
>
> Pretty much, yes.  You can add some text to give extra background on the
> design, the motivation for some of the choices, or ask questions about
> particular details, but that's not indispensable.
>
> You can also send an email that just refers to a branch in emacs.git.
> But for the discussion to work well, it's usually better to make sure
> this branch is "small" so people aren't discouraged to read the large
> diff ;-)

Sounds good, I'll proceed with that once I make sure there are no issues
with that branch.

>> I was able to speed that function up to the point that it's about the
>> same as one using `read`. Those functions are doing a whole lot of IO
>> (reading and writing hundreds of files) so it's not really a fair
>> comparison. I've done more tests with functions that just read a whole
>> buffer, collecting what they read into a list. In a 9600 line file with
>> just over 500 sexps, the `read` version took about ~.02-.04 seconds
>> (according to `benchmark-run-compiled`), and the `source-map-read`
>> version took ~.08 seconds when it didn't GC, but unlike with `read` it
>> did cause a GC 10-20% of the time.
>
> IME when the time is in the sub-second range the measurements are very
> imprecise, so better measure the time to repeat the same `read` N times
> so the total time is a few seconds (and since it's the same `read`,
> it won't suffer from extra IO overhead).

Sure, I'll do some more exhaustive testing. So far though, the results
aren't great, the biggest issue being memory usage. The
`source-map-read` can GC over 5 times more often than `read`. Obviously
edebug isn't the answer. I could start trying to simplify and adapt the
useful bits of edebug for the source-map reader directly, but I think
it's more sensible to accept that a real implementation will have to be
in C and this reader will just remain a prototype.

>>> For macros, OTOH, it's really fundamentally hard (or impossible, in
>>> general).
>> Helmut Eller mentioned before that most macros do use at least some of
>> the original code in their expansion.
>
> We can definitely hope to use some heuristics that will preserve "most"
> source info for "most" existing macros, yes.
> But it's still a fundamentally impossible problem in general ;-)
>
>>> We could/should introduce some new way to define macros which
>>> knows about "source code annotated with locations".
>> I've wondered about this too but don't know what the right approach
>> would be.
>
> The first step is to define a `defmacro2` which works like `defmacro`
> but is defined to take as arguments (and to return) annotated-sexps
> instead of "bare sexps".  It'll be less convenient to use, but
>
> In Scheme "annotated sexps" are called "syntax objects".
>
>> I doubt anyone would want to use something like macro-cons/list/append
>> etc. functions,
>
> Scheme avoids the problem by defining additional higher-level layers,
> where macros are defined in a more restrictive way using templates, so
> for most macros the programmer doesn't need to use care very much about
> the difference between bare sexps and syntax objects.
>
> The main motivation for it was hygiene (the framework takes care of
> adding the needed `gensym`s where applicable) rather than tracking
> source-location, but fundamentally the issue is the same: an AST node is
> not just some random sexp.
>
> IOW "code and data aren't quite the same, after all" ;-)
>
> See for example `syntax-case` 
> https://www.gnu.org/software/guile/manual/html_node/Syntax-Case.html
> Note that Scheme uses the #' notation for syntax objects.  Adapting the
> example for `when` to an Elisp syntax could look like:
>
>     (defmacro2 when (form)
>       (elisp-case form
>         ((_ test e e* ...) (elisp (if test (progn e e* ...))))))
>
> [ Where I used `elisp` instead of Scheme's `syntax` since we already use
>   the prefix "syntax-" for things related to syntax-tables.  ]
>
> Notice how it's `elisp-case` which extracts `test`, `e`, and `e*` and
> then it's `syntax` which builds the new chunk of code, so all the
> replacement of `car` with `elisp-car` can be hidden within the definition
> of `elisp-case` and `elisp`.

Aha, I had never even considered hygienic macros in Elisp (nor had I
recognized how trivial it is to track their source-code). That would be
an amazing development for Emacs Lisp, but is certainly a huge
undertaking, not something I could fit into the GSoC timeline. I know
that it has been done in Common Lisp (by Pascal Costanza), but I believe
that implementation serves the sole purpose of capture avoidance and
doesn't abstract syntax. For Emacs I assume this would have to be done
in C, but I do wonder if an Elisp implementation would be possible.

>>> There's a lot of work on Scheme macros we could leverage for that.
>> Interesting, so far I've had some difficulty finding documentation about
>> how other Lisps track source locations.
>
> It's not really discussed, but the distinction between "sexp" and
> "syntax object" is the key.  It's largely not discussed because Scheme
> macros have never officially included the equivalent of `defmacro`
> operating on raw sexps, so they've never really had to deal with the
> issue (tho Gambit does provide a `define-macro` which operates like our
> `defmacro` but it's rarely used so Gambit just punts on the
> source-location issue in that case).

Doing the similar thing in Elisp -- relegating source location tracking
to code using only a specialized kind of macro, hygienic or otherwise --
would of course be a major loss, since it would take years for that new
paradigm to become commonplace.

-Zach




reply via email to

[Prev in Thread] Current Thread [Next in Thread]