[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to add pseudo vector types

From: Eli Zaretskii
Subject: Re: How to add pseudo vector types
Date: Sat, 17 Jul 2021 10:16:07 +0300

> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Fri, 16 Jul 2021 22:23:26 -0400
> On 7/16/21 10:05 PM, Yuan Fu wrote:
> > My conclusion is that after-change-hook is pretty insignificant, and the 
> > initial parse is a bit slow (on large files).
> I have no idea if it makes sense, but: does the initial parse need to be 
> synchronous, or could you instead run the parsing in one thread, and the rest 
> of Emacs in another? (I'm talking about concurrent execution, not cooperative 
> threading).

You cannot have a thread freely accessing buffer text when the Lisp
machine is allowed to run concurrently with this, because the Lisp
machine can change the buffer text.

> In most cases there should be very limited contention, if at at all: in large 
> buffers most of Emacs' activity will be focused on the (relatively few) 
> characters around the gap, and most of the parser's activity will be reading 
> from the buffer at other positions.

When Emacs moves or enlarges/shrinks the gap, that affects the entire
buffer text after the gap, regardless of where the gap is.  So it will
affect the TS reader if it reads stuff after the gap.

> You do need to be careful to not read the garbage data from the gap, but 
> otherwise seeing stale or even inconsistent data from the parser thread 
> shouldn't be an issue, since tree-sitter is supposed to be robust to bad 
> parses.

What would be the purpose of calling the parser if we know in advance
it will fail when it gets to the "garbage" caused by async access to
the buffer text?

And besides, current Emacs primitives that access buffer text don't
necessarily do that atomically, since the assumption built into their
design is that no one should access that text at the same time.  So
you could have windows where the buffer text is in inconsistent state,
like if the gap was moved, but the variables which tell where the gap
is were not yet updated, or windows where a multibyte character was
not yet completely written or deleted to/from the buffer, resulting in
invalid multibyte sequences and inconsistent values of EOB.

So I don't see how this could be done without some inter-locking.

And what do you want the code which requested parsing do while the
parse thread runs?  The requesting code is in the main thread, so if
it just waits, you don't gain anything.

> In fact, depending on how robust tree-sitter is, you might even be able to do 
> the concurrency-control optimistically (parse everything up to close to the 
> gap, check that the gap hasn't moved into the region that you read, and then 
> resume reading or rollback).

I don't understand what you suggest here.  For starters, the gap could
move (assuming you are still talking about a separate thread that does
the parsing), and what do we do then?

> Alternatively, maybe you could even do a full parse with minimal concurrency 
> control: you'd make sure that the Emacs thread records not just changes to 
> the buffer text but also movements of the gap, and then you could use that 
> list of changes for the next parse?

I don't understand what could recording the gap solve.  The stuff in
the gap is generally garbage, and can easily include invalid multibyte
sequences.  I don't think it's a good idea to pass that to TS.  Also,
recording the gap changes in the main thread and accessing that
information from a concurrent thread again opens a window for races,
and requires synchronization.

Bottom line, I think what you are suggesting is premature
optimization: we don't yet know that we will need this.  If the TS
performance information is reliable, it should be fast enough for our
purposes; we just need to come up with an optimal way of calling it so
that we don't impose unnecessary delays.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]