[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to add pseudo vector types

From: Eli Zaretskii
Subject: Re: How to add pseudo vector types
Date: Thu, 22 Jul 2021 20:00:43 +0300

> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 22 Jul 2021 09:47:45 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>  Clément Pit-Claudel <cpitclaudel@gmail.com>,
>  emacs-devel@gnu.org
> Yes, I meant to discuss this. The problem with respecting narrowing is that, 
> a user can freely narrow and widen arbitrarily, and Emacs needs to translate 
> them into insertion & deletion of the buffer text for tree-sitter, every time 
> a user narrows or widens the buffer. Plus, if tree-sitter respects narrowing, 
> it could happen where a user narrows the buffer, the font-locking changes and 
> is not correct anymore. Maybe that’s not the user want. Also, if someone 
> narrows and widens often, maybe narrow to a function for better focus, 
> tree-sitter needs to constantly re-parse most of the buffer. These are not 
> significant disadvantages, but what do we get from respecting narrowing that 
> justifies code complexity and these small annoyances?

But that's how the current font-lock and indentation work: they never
look beyond the narrowing limits.  So why should the TS-based features
behave differently?

As for temporary narrowing: if we record the changes, but don't send
them to TS until we actually need re-parsing, then we could eliminate
the temporary narrowing when we report the changes to TS, leaving only
the narrowing that exists at the time of the re-parse.  At least for
fontifications, that time is redisplay time, and users do expect to
see the text fontified according to the current narrowing.

> >>   *bytes_read = (uint32_t) len;
> > 
> > Is using uint32_t the restriction of tree-sitter?  Doesn't it support
> > reading more than 2 gigabytes?
> I’m not sure why it asks for uint32 specifically, but that’s what it asks for 
> its api. I don’t think you are supposed to use tree-sitter on files of size 
> of gigabytes, because the author mentioned that tree-sitter uses over 10x as 
> much memory as the size of the source file [1]. On files larger than a couple 
> of megabytes, I think we better turn off tree-sitter. Normally those files 
> are not regular source files, anyway, and we don’t need a parse tree for a 
> log.

I don't necessarily agree with the "not regular source files" part.
For example, JSON files can be quite large.  And there are also log
files, which are even larger -- did no one adapt TS to fontifying
those yet?

More generally: is the problem real?  If you make a file that is 1000
copies of xdisp.c, and then submit it to TS, do you really get 10GB of
memory consumption?  This is something that is good to know up front,
so we'd know what to expect down the road.

> That leads to another point. I suspect the memory limit will come before the 
> speed limit, i.e., as the file size increases, the memory consumption will 
> become unacceptable before the speed does. So it is possible that we want to 
> outright disable tree-sitter for larger files, then we don’t need to do much 
> to improve the responsiveness of tree-sitter on large files. And we might 
> want to delete the parse tree if a buffer has been idle for a while. Of 
> course, that’s just my superstition, we’ll see once we can measure the 
> performance.

See above: IMO, we should benchmark both the CPU and memory
performance of TS for such large files, before we decide on the course
of action.

> >> +DEFUN ("tree-sitter-node-type",
> >> +       Ftree_sitter_node_type, Stree_sitter_node_type, 1, 1, 0,
> >> +       doc: /* Return the NODE's type as a symbol.  */)
> >> +  (Lisp_Object node)
> >> +{
> >> +  CHECK_TS_NODE (node);
> >> +  TSNode ts_node = XTS_NODE (node)->node;
> >> +  const char *type = ts_node_type(ts_node);
> >> +  return intern_c_string (type);
> > 
> > Why do we need to intern the string each time? can't we store the
> > interned symbol there, instead of a C string, in the first place?
> I’m not sure what do you mean by “store the interned symbol there”, where do 
> I store the interned symbol?

In the struct that ts_node_type accesses, instead of the 'char *'
string you store there now.

> (BTW, If you see something wrong, that’s probably because I don’t know the 
> right way to do it, and grepping only got me that far.)

Do what? feel free to ask questions when you aren't sure how to
accomplish something on the C level.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]