--- Begin Message ---
Subject: |
The :match predicate with large regexp in tree-sitter font-lock seems inefficient |
Date: |
Fri, 20 Jan 2023 05:53:12 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 |
In my benchmarking -- using this form in
test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling ruby-ts-mode:
(benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1)
(let (treesit--font-lock-fast-mode) (font-lock-ensure))))
the rule added to its font-lock in commit d66ac5285f7
:language language
:feature 'builtin-functions
`((((identifier) @font-lock-builtin-face)
(:match ,ruby-ts--builtin-methods
@font-lock-builtin-face)))
...seems to have made it 50% slower.
The profile looked like this:
9454 84% - font-lock-fontify-region
9454 84% - font-lock-default-fontify-region
8862 79% - font-lock-fontify-syntactically-region
8702 78% - treesit-font-lock-fontify-region
128 1% treesit-fontify-with-override
123 1% facep
84 0%
treesit--children-covering-range-recurse
60 0% + ruby-ts--comment-font-lock
4 0% + font-lock-unfontify-region
568 5% + font-lock-fontify-keywords-region
16 0% + font-lock-unfontify-region
So there's nothing on the Lisp level to look at.
Looking at the code, apparently we get a cursor and basically iterate
through all (identifier) nodes, running our predicate manually.
Without trying something more advanced like perf, I took a stab in the
dark and tried to reduce string allocation in treesit_predicate_match
(it currently ends up delegating to buffer-substring for every node),
which seemed inefficient. But while my patch (attached) compiles and
doesn't crash, it doesn't actually work (the rule's highlighting is
missing), and the performance was unchanged.
This message was originally longer, but see commit d94dc606a09: I
switched to using :pred -- thus avoiding embedding the 720-char long
regexp in the query -- and the performance drop got reduced to like 20%.
As a baseline, this simplified query without predicates and colors every
identifier in the buffer using the specified face, is still faster (just
10% over the original):
:language language
:feature 'builtin-function
`(((identifier) @font-lock-builtin-face))
The regexp matching itself doesn't seem to be the problem:
(benchmark 354100 '(string-match-p ruby-ts--builtin-methods "gsub"))
=> Elapsed time: 0.141681s
-- whereas the difference between the benchmarks is on the order of seconds.
I think the marshaling of the long regexp string back and forth could be
the culprit. Would be nice to fix that somehow.
I also think that trying to reduce the string allocation overhead has
potential, but so far all my experiments haven't moved the needle
anywhere noticeable.
treesit_predicate_match.diff
Description: Text Data
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient |
Date: |
Thu, 2 Feb 2023 21:44:09 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 |
On 02/02/2023 20:03, Eli Zaretskii wrote:
Date: Thu, 2 Feb 2023 19:53:07 +0200
Cc:casouri@gmail.com,60953@debbugs.gnu.org
From: Dmitry Gutov<dgutov@yandex.ru>
If the search fails, search_buffer returns a non-positive integer, not
zero.
That should work too. As long as it never returns 0 for success.
Which seems to be confirmed by the check
if (np <= 0)
... signal error
inside search_command.
Yes, because buffer position can never be zero.
All right.
Now pushed; thanks, and closing.
--- End Message ---