Re: Using incremental parsing in Emacs

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using incremental parsing in Emacs

From:	Dmitry Gutov
Subject:	Re: Using incremental parsing in Emacs
Date:	Fri, 10 Jan 2020 00:56:38 +0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0

On 04.01.2020 16:46, arthur miller wrote:

There is a very good presentation of tree-sitter on YT by its author:

https://www.youtube.com/watch?v=Jes3bD6P0To

Looks much better then what I got a picture by just reading on the
website:


It was a good watch.

Some takeaways from me:

It implements a GLR parser. One that can update the existing AST quicklyfor an arbitrary edit in the middle of a file. (*)


But it parses a new file quickly as well: a 20000 lines JS file in 54ms.

To be able to reach that speed, they went the traditionalcompiler-writer route of having a separate (grammar-to-C-code)compilation step from a grammar to a parser program (which relies on ashared runtime). (**)

Some of it seems to be by necessity. Every run returns a full AST, notjust an "AST up to this position". I suppose the author didn't want theproblems that come with unfinished parse trees when code relies on thatreturned value. (***)

The generated parser, in addition to being incremental, iserror-tolerant, which is a necessity for use in editors.

As a result, they have features like fast semantic syntax highlighting,as well code folding that accurately detects where function body beginsand ends (previously, Atom and other editors used guessing based onindentation levels, apparently). And a "extend selection" command basedon AST as well (****)

Tree-Sitter is also used inside GitHub for various features, includingtheir Semantic library (which implements code navigation on the web).

In the meantime, our current answer to all of the above is syntax-ppssplus local regexp-based parsing around the visible part of the buffer.


To compare:

(*) syntax-ppss is also fully incremental, although the returned valueis a very simplistic substitute for an AST. But we've been using it fora while and have done solid things with it.

(**) Which means that if we try to use Tree-Sitter as-is, our currentpractice of defining the language grammar in Lisp would go our of thewindow. https://github.com/ubolonton/emacs-tree-sitter demonstrates thisas well: language grammars have to be compiled into a shared library (orlibraries). We would have lots of grammars supplied by the third party,which is kind of good, but we would lose the ease of experimenting withthem that we have now, or being able to write support for a newup-and-coming language very quickly. Which a certain fraction of ourusers enjoys, AFAIK.

(***) Whereas syntax-ppss stops at a requested position, thus saving onCPU cycles this way. Similarly, if a new system we'll transition tosomeday also does this, its absolute performance/throughput would beless important if it only usually has to parse a screen-worth of file ata time.

(****) We've been managing surprisingly well with syntax-ppss,forward-sexp, etc. So code folding works quite well in Emacs already,and the easy-kill package in GNU ELPA does the "expand selection" thingvery successfully as well. But we could use some improvement in havingsome more complex syntax supported or handled more easily, in certainlanguages. Having a "proper AST" available is nothing to sneeze ateither, and would likely help a lot in indentation code.

My personal takeaway is that we could really benefit from a lispierversion of this technology, and Someone(tm) should start working on that.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Replacing all C code????, (continued)
- Re: Using incremental parsing in Emacs, arthur miller, 2020/01/04
  - Re: Using incremental parsing in Emacs, Alan Third, 2020/01/05
    - Re: Using incremental parsing in Emacs, arthur miller, 2020/01/05
    - Re: Using incremental parsing in Emacs, Eli Zaretskii, 2020/01/05
    - Re: Using incremental parsing in Emacs, Eli Zaretskii, 2020/01/05
    - Re: Using incremental parsing in Emacs, Stephen Leake, 2020/01/05
  - Re: Using incremental parsing in Emacs, Dmitry Gutov <=
    - Re: Using incremental parsing in Emacs, Eli Zaretskii, 2020/01/10
    - Re: Using incremental parsing in Emacs, Dmitry Gutov, 2020/01/10
    - Re: Using incremental parsing in Emacs, Eli Zaretskii, 2020/01/11
    - Re: Using incremental parsing in Emacs, Dmitry Gutov, 2020/01/11
    - Re: Using incremental parsing in Emacs, Eli Zaretskii, 2020/01/11
- Re: Using incremental parsing in Emacs, Yuan Fu, 2020/01/04
- Re: Using incremental parsing in Emacs, Stefan Monnier, 2020/01/04
  - Re: Using incremental parsing in Emacs, Alan Third, 2020/01/05
    - Re: Using incremental parsing in Emacs, Stephen Leake, 2020/01/05
    - Re: Using incremental parsing in Emacs, Alan Third, 2020/01/05

Prev by Date: Re: write-region bug ?
Next by Date: Re: write-region bug ?
Previous by thread: Re: Using incremental parsing in Emacs
Next by thread: Re: Using incremental parsing in Emacs
Index(es):
- Date
- Thread