emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#60953: closed (The :match predicate with large regexp in tree-sitter


From: GNU bug Tracking System
Subject: bug#60953: closed (The :match predicate with large regexp in tree-sitter font-lock seems inefficient)
Date: Thu, 02 Feb 2023 19:45:02 +0000

Your message dated Thu, 2 Feb 2023 21:44:09 +0200
with message-id <2fbb1175-9ab1-e2df-16a9-2d32f1cc226f@yandex.ru>
and subject line Re: bug#60953: The :match predicate with large regexp in 
tree-sitter font-lock seems inefficient
has caused the debbugs.gnu.org bug report #60953,
regarding The :match predicate with large regexp in tree-sitter font-lock seems 
inefficient
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs@gnu.org.)


-- 
60953: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953
GNU Bug Tracking System
Contact help-debbugs@gnu.org with problems
--- Begin Message --- Subject: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Date: Fri, 20 Jan 2023 05:53:12 +0200 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 In my benchmarking -- using this form in test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling ruby-ts-mode:

(benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let (treesit--font-lock-fast-mode) (font-lock-ensure))))

the rule added to its font-lock in commit d66ac5285f7

   :language language
   :feature 'builtin-functions
   `((((identifier) @font-lock-builtin-face)
      (:match ,ruby-ts--builtin-methods
       @font-lock-builtin-face)))

...seems to have made it 50% slower.

The profile looked like this:

  9454  84%                   - font-lock-fontify-region
  9454  84%                    - font-lock-default-fontify-region
  8862  79%                     - font-lock-fontify-syntactically-region
  8702  78%                      - treesit-font-lock-fontify-region
   128   1%                         treesit-fontify-with-override
   123   1%                         facep
84 0% treesit--children-covering-range-recurse
    60   0%                       + ruby-ts--comment-font-lock
     4   0%                       + font-lock-unfontify-region
   568   5%                     + font-lock-fontify-keywords-region
    16   0%                     + font-lock-unfontify-region

So there's nothing on the Lisp level to look at.

Looking at the code, apparently we get a cursor and basically iterate through all (identifier) nodes, running our predicate manually.

Without trying something more advanced like perf, I took a stab in the dark and tried to reduce string allocation in treesit_predicate_match (it currently ends up delegating to buffer-substring for every node), which seemed inefficient. But while my patch (attached) compiles and doesn't crash, it doesn't actually work (the rule's highlighting is missing), and the performance was unchanged.

This message was originally longer, but see commit d94dc606a09: I switched to using :pred -- thus avoiding embedding the 720-char long regexp in the query -- and the performance drop got reduced to like 20%.

As a baseline, this simplified query without predicates and colors every identifier in the buffer using the specified face, is still faster (just 10% over the original):

   :language language
   :feature 'builtin-function
   `(((identifier) @font-lock-builtin-face))

The regexp matching itself doesn't seem to be the problem:

(benchmark 354100 '(string-match-p ruby-ts--builtin-methods "gsub"))

=> Elapsed time: 0.141681s

-- whereas the difference between the benchmarks is on the order of seconds.

I think the marshaling of the long regexp string back and forth could be the culprit. Would be nice to fix that somehow.

I also think that trying to reduce the string allocation overhead has potential, but so far all my experiments haven't moved the needle anywhere noticeable.

Attachment: treesit_predicate_match.diff
Description: Text Data


--- End Message ---
--- Begin Message --- Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Date: Thu, 2 Feb 2023 21:44:09 +0200 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2
On 02/02/2023 20:03, Eli Zaretskii wrote:
Date: Thu, 2 Feb 2023 19:53:07 +0200
Cc:casouri@gmail.com,60953@debbugs.gnu.org
From: Dmitry Gutov<dgutov@yandex.ru>

If the search fails, search_buffer returns a non-positive integer, not
zero.
That should work too. As long as it never returns 0 for success.

Which seems to be confirmed by the check

    if (np <= 0)
      ... signal error

inside search_command.
Yes, because buffer position can never be zero.

All right.

Now pushed; thanks, and closing.


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]