emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitte


From: Ergus
Subject: Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
Date: Fri, 30 Jul 2021 15:32:54 +0200

On Fri, Jul 30, 2021 at 02:06:00PM +0200, Arthur Miller wrote:
Andrei Kuznetsov <r12451428287@163.com> writes:

Leake <stephen_leake@stephe-leake.org> writes:

That's true for the common TS runtime, which implements the parser and
error recovery, but the code for each language, that builds the LR parse
table and some other data structures, is generated in C from a grammar
file written in javascript, and must be linked into Emacs somehow. In
addition, some languages require an "external scanner", which is more
code in C that is specific to the language.

Interesting.  I assume it would be possible to reuse the source grammar
files?

It probably is, and looking at neowim's gh repo, there are some
instructions on how to create a grammar for new language:

https://github.com/nvim-treesitter/nvim-treesitter

The process could probably be somehow automated from lisp.

I have though a sincere question about this entire tree-sitter
venture. Is it really worth trouble in Emacs case? As I understand TS it
is a specialized regex matcher, and looking at some language specs leave
me with that feeling (for example the grammar for bash):

https://github.com/tree-sitter/tree-sitter-bash/blob/master/src/grammar.json

I undestand that having specialized regex matcher is more efficient than
some generalized regular matcher current font-locking in Emacs relies
upon, but is it *that* more efficient to be worth the extra troubles?
TS seem to keep state (a node) for each character typed, that will be a
lot of memory consumed in some big files. If this syntax tree it keeps
to implement what it does can be re-used for something else than it
could be very useful, but just for syntax-highlight and indentation?
Some years ago, when opening some 10k lines as found in Emacs src dir, I
noticed some slowdown on font lock. But nowadays I don't experience any
hickups with syntax hightlighting or indentation.

Anyway, it is very educating to see TS get merged into Emacs and to read
Eli's tips and guidance about Emacs internals.

The TS thing came out due to some issues in the c-mode highlighting
reported in that thread: correctness and speed (slowing down things like
scrolling). c-mode does its best, but C++ is evolving and more complex
analysis comes with a penalty and more and more code complexity in the
parser. Same happens with new languages very extended.

It will be very difficult to implement a complete/competitive mode like
c-mode for all the new languages that are very popular today (rust,
typescript; even python). So we end having some "weak" modes with
inconsistencies and different bindings and color themes. Those become
unmaintained after a time because the developers migrate to more
complete editors/ide and new developers just don't come to emacs because
it does not satisfy their needs to start with.

 Probably I am wrong but 99% of the web developers (React, Nodejs,
Angular) are using VSCode, the rest are with neovim; so we don't even
have people with enough knowledge and motivation to implement one of
those in Emacs one by one.

Because these languages are more complex to analyze and because we don't
have people to maintain a mode for all of them. Trying to do so will
spend too much developer time reinventing what TS already does (and does
it right, efficiently and with a support community).

So; maintaining a mode for every language we currently don't support is
not scalable over time. And reimplementing a replacement for TS in Elisp
won't worth it and will end up being very slow and repeating all the
errors that TS developers have already solved. TS may be useful not only
for syntax highlight and indentation but also for code navigation and
some basic syntax checking.

Basically TS is: One "infrastructure" to rule them all.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]