[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to add pseudo vector types

From: Yuan Fu
Subject: Re: How to add pseudo vector types
Date: Thu, 22 Jul 2021 13:47:20 -0400

> On Jul 22, 2021, at 1:00 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 22 Jul 2021 09:47:45 -0400
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> emacs-devel@gnu.org
>> Yes, I meant to discuss this. The problem with respecting narrowing is that, 
>> a user can freely narrow and widen arbitrarily, and Emacs needs to translate 
>> them into insertion & deletion of the buffer text for tree-sitter, every 
>> time a user narrows or widens the buffer. Plus, if tree-sitter respects 
>> narrowing, it could happen where a user narrows the buffer, the font-locking 
>> changes and is not correct anymore. Maybe that’s not the user want. Also, if 
>> someone narrows and widens often, maybe narrow to a function for better 
>> focus, tree-sitter needs to constantly re-parse most of the buffer. These 
>> are not significant disadvantages, but what do we get from respecting 
>> narrowing that justifies code complexity and these small annoyances?
> But that's how the current font-lock and indentation work: they never
> look beyond the narrowing limits.  So why should the TS-based features
> behave differently?
> As for temporary narrowing: if we record the changes, but don't send
> them to TS until we actually need re-parsing, then we could eliminate
> the temporary narrowing when we report the changes to TS, leaving only
> the narrowing that exists at the time of the re-parse.  At least for
> fontifications, that time is redisplay time, and users do expect to
> see the text fontified according to the current narrowing.

>>>>  *bytes_read = (uint32_t) len;
>>> Is using uint32_t the restriction of tree-sitter?  Doesn't it support
>>> reading more than 2 gigabytes?
>> I’m not sure why it asks for uint32 specifically, but that’s what it asks 
>> for its api. I don’t think you are supposed to use tree-sitter on files of 
>> size of gigabytes, because the author mentioned that tree-sitter uses over 
>> 10x as much memory as the size of the source file [1]. On files larger than 
>> a couple of megabytes, I think we better turn off tree-sitter. Normally 
>> those files are not regular source files, anyway, and we don’t need a parse 
>> tree for a log.
> I don't necessarily agree with the "not regular source files" part.
> For example, JSON files can be quite large.  And there are also log
> files, which are even larger -- did no one adapt TS to fontifying
> those yet?

There is a JSON parser, but I don’t think there is one for log files.

> More generally: is the problem real?  If you make a file that is 1000
> copies of xdisp.c, and then submit it to TS, do you really get 10GB of
> memory consumption?  This is something that is good to know up front,
> so we'd know what to expect down the road.

Yes. I concatenated 100 xdisp.c together, and parsed them with my simple C 
program. It used 1.8 G. I didn’t test for 1000 together, but I think the trend 
is linear.

time -l ./main-large-c
       16.48 real        15.32 user         0.81 sys
          1883959296  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              459951  page reclaims
                  22  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   6  voluntary context switches
                1653  involuntary context switches
        107310143182  instructions retired
         58561420060  cycles elapsed
          1883095040  peak memory footprint

>> That leads to another point. I suspect the memory limit will come before the 
>> speed limit, i.e., as the file size increases, the memory consumption will 
>> become unacceptable before the speed does. So it is possible that we want to 
>> outright disable tree-sitter for larger files, then we don’t need to do much 
>> to improve the responsiveness of tree-sitter on large files. And we might 
>> want to delete the parse tree if a buffer has been idle for a while. Of 
>> course, that’s just my superstition, we’ll see once we can measure the 
>> performance.
> See above: IMO, we should benchmark both the CPU and memory
> performance of TS for such large files, before we decide on the course
> of action.

That’s my thought, too. I should have reserved my suspicion until I have 
benchmark measurements.

>>>> +DEFUN ("tree-sitter-node-type",
>>>> +       Ftree_sitter_node_type, Stree_sitter_node_type, 1, 1, 0,
>>>> +       doc: /* Return the NODE's type as a symbol.  */)
>>>> +  (Lisp_Object node)
>>>> +{
>>>> +  CHECK_TS_NODE (node);
>>>> +  TSNode ts_node = XTS_NODE (node)->node;
>>>> +  const char *type = ts_node_type(ts_node);
>>>> +  return intern_c_string (type);
>>> Why do we need to intern the string each time? can't we store the
>>> interned symbol there, instead of a C string, in the first place?
>> I’m not sure what do you mean by “store the interned symbol there”, where do 
>> I store the interned symbol?
> In the struct that ts_node_type accesses, instead of the 'char *'
> string you store there now.

The struct that ts_node_type accesses is a TSNode, which is defined by 
tree-sitter. ts_node_type is an API provided by tree-sitter, I’m just exposing 
it to lisp. I could return strings instead of symbols, but I thought symbols 
might be more appropriate and more convenient for users of this function. 

>> (BTW, If you see something wrong, that’s probably because I don’t know the 
>> right way to do it, and grepping only got me that far.)
> Do what? feel free to ask questions when you aren't sure how to
> accomplish something on the C level.

Thanks. Is below the correct way to set a buffer-local variable? (I’m setting 

struct buffer *old_buffer = current_buffer;
  set_buffer_internal (XBUFFER (buffer));

  Fset (Qtree_sitter_parser_list,
        Fcons (lisp_parser, Fsymbol_value (Qtree_sitter_parser_list)));

  set_buffer_internal (old_buffer);

Also, we don’t call change hooks in replace_range_2, why? Should I update 
tree-sitter trees in that function, or should I not?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]