emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to add pseudo vector types


From: Eli Zaretskii
Subject: Re: How to add pseudo vector types
Date: Sat, 17 Jul 2021 09:56:13 +0300

> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 16 Jul 2021 22:05:01 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>  emacs-devel@gnu.org
> 
> Please have a look at the second patch that applies on top of the first one. 
> This time I added after-change hooks, so if you create a parser for a buffer 
> and edit that buffer, the parser is kept updated lazily.

Instead of using the hook machinery, it is better to augment the
insdel.c functions to directly update the information you need to
keep.  If you do it directly in the insdel.c functions that modify the
buffer text, you can be much more accurate in updating the information
about the changes, because each insdel.c function performs a
well-defined operation of the buffer text.  By contrast, buffer-change
hooks are higher-level functionality, meant for Lisp programs, so they
don't necessarily make it easy to reverse-engineer the specific
changes.  All you have there is some higher-level information about
which part of the buffer changed.  Moreover, the hooks are sometimes
called more times than they should be, to be on the safe side.

As a trivial (but not insignificant!) optimization, primitive insdel.c
functions always know both the character and the byte positions in the
buffer they change, so this code you needed in your hooks:

> +  ptrdiff_t beg_byte = CHAR_TO_BYTE (start_int);
> +  ptrdiff_t old_end_byte = CHAR_TO_BYTE (end_int);

could be avoided.  Converting character to byte positions can
sometimes significantly slow down the code, for example if there are a
lot of markers in the buffer.

So I urge you to record the change information directly in the
primitive functions of insdel.c, not in the hooks.

> In summary, the parser parses the whole buffer on the first time when the 
> user asks for the parse tree. In after-change-hook, no parsing is done, but 
> we do update the trees with position changes. On the next time when the user 
> asks for the parse tree, the whole buffer is re-parsed incrementally. (I 
> didn’t read the paper, but I assume it knows where are the bits to re-parse 
> because we updated the tree with position changes.)

Why do you update the entire parser list for every modification?  This
comment:

> +void
> +ts_before_change (ptrdiff_t start_int, ptrdiff_t end_int)
> +{
> +  /* Iterate through each parser in 'tree-sitter-parser-list' and
> +     record the byte position.  There could be better ways to record
> +     it than storing the same position in every parser, but this is
> +     the most fool-proof way, and I expect a buffer to have only one
> +     parser most of the time anyway. */

already says that there are better ways: record the change info just
once in one place.  Then propagate that to the entire list, if you
need, when you actually need to call TS.  It is possible that at the
call time you will know which parser needs to be called, and will be
able to update only that parser, not the entire list.

> +  // FIXME: Add some boundary checks?
> +  /* I believe we can get away with only setting current-buffer
> +     and not actually switching to it, like what we did in
> +     'make_gap_1'.  */
> +  struct buffer *old_buffer = current_buffer;
> +  current_buffer = (struct buffer *) buffer;

This looks unnecessary: we have BUF_BYTE_ADDRESS, which accepts the
buffer as its argument, and the corresponding buf_next_char_len.  IOW,
why did you need to switch to the buffer?

> +  /* Read one character.  */
> +  char *beg;
> +  int len;
> +  if (byte_pos >= Z_BYTE)
> +    {
> +      beg = "";
> +      len = 0;
> +    }

Is getting an empty string what TS wants when it attempts to read
beyond EOB?

Also, why do you test Z_BYTE and not ZV_BYTE (actually, BUF_ZV_BYTE)?
Emacs in general behaves as if text beyond point-max didn't exist, why
should code supported by the TS parser behave differently?

> +      beg = (char *) BYTE_POS_ADDR (byte_pos);
> +      len = next_char_len(byte_pos);

This is sub-optimal: next_char_len also calls BYTE_POS_ADDR.  Why not
use BYTES_BY_CHAR_HEAD instead?

Thanks for working on this.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]