[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#38104: 27.0.50; elixir-mode fontification is very slow
From: |
Mattias Engdegård |
Subject: |
bug#38104: 27.0.50; elixir-mode fontification is very slow |
Date: |
Tue, 26 Nov 2019 20:32:29 +0100 |
26 nov. 2019 kl. 17.26 skrev Dmitry Gutov <dgutov@yandex.ru>:
> elixir-mode does use rx, heavily. Albeit with a thin wrapper.
As it turned out, rx is fine (now); elixir-mode, not quite. In elixir-mode.el,
we have
(identifiers . ,(rx (one-or-more (any "A-Z" "a-z" "_"))
(zero-or-more (any "A-Z" "a-z" "0-9" "_"))
(optional (or "?" "!"))))
First, this regex is suboptimal: the first character of an identifier should
occur exactly once, or you get bad backtracking behaviour. Just remove the
one-or-more construct:
(identifiers . ,(rx (any "A-Z" "a-z" "_")
(zero-or-more (any "A-Z" "a-z" "0-9" "_"))
(optional (or "?" "!"))))
This definition is then used in several places, but two in particular are of
interest to us:
;; Module attributes
(,(elixir-rx (and "@" (1+ identifiers)))
The construct (1+ identifiers) was perhaps meant to match multiple identifiers,
but it doesn't (no separator); it just matches an identifier in several ways,
which again leads to bad backtracking behaviour.
The same problem here:
;; Map keys
(,(elixir-rx (group (and (one-or-more identifiers) ":")) space)
Remove the 1+ and one-or-more and it's fast again.
Why did this "work" with the old rx implementation? Because that code had a
nasty bug: it does not bracket definitions in rx-constituents properly. Example:
(let ((rx-constituents (cons '(hello . "HELLO") rx-constituents)))
(rx-to-string '(1+ hello) t))
=> "HELLO+"
The new rx implementation does not suffer from this bug.
The result in your case is that the old rx, when translating (1+ identifiers),
only tacked the "+" onto whatever regexp 'identifiers' produced, resulting in
"[A-Z_a-z]+[0-9A-Z_a-z]*[!?]?+"
which is a lot faster, since only the final [!?] is repeated twice (and it
probably doesn't match very often).