bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#56682: Fix the long lines font locking related slowdowns


From: Dmitry Gutov
Subject: bug#56682: Fix the long lines font locking related slowdowns
Date: Sun, 14 Aug 2022 23:46:13 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1

On 14.08.2022 20:59, Eli Zaretskii wrote:
Date: Sun, 14 Aug 2022 20:47:40 +0300
Cc: 56682@debbugs.gnu.org, gregory@heytings.org, monnier@iro.umontreal.ca
From: Dmitry Gutov <dgutov@yandex.ru>

The better way is to acknowledge that some inaccuracies are acceptable
in those cases.  With that in mind, one can design a syntax analyzer
that looks back only a short ways, until it finds some place that
could reasonably serve as an anchor point for heuristic decisions
about whether we are inside or outside a string or comment, and then
verifying that guess with some telltale syntactic elements that follow
(like semi-colons or comment-end delimiters in C).  While this kind of
heuristics can sometimes fail, if they only fail rarely, the result is
a huge win.

You cannot design a language-agnostic syntax analyzer like that.

_I_ cannot, but hopefully someone else will.

That seems unlikely. Nothing's impossible, of course, but I wouldn't want to wait for such an invention to come up before we make the decision on how to proceed now.

What _can_ be done is make syntax-ppss's cache invalidations more local by introducing a "repair" step. That would only speed up certain operations, at most, and the initial wait near EOB can't be avoided this way.

In any case, the way to speed up these cases is to look at the profile
and identify the code that is slowing us down; then attempt to make it
faster.  (20 sec is actually long enough for us to interrupt Emacs
under a debugger and look at the backtrace to find the culprit.)

I've profiled and benchmarked this scenario already: all of the delay
(17 seconds, to be precise) come from parse-partial-sexp. 1 GB is a lot.

Before we get to 1GB files, there are 20MB files and 250MB files.  I
found quite a few low-hanging fruit in those that are worth plucking,
while we wait for parse-partial-sexp to get its act together.

Definitely.

But when the profiler output in a 1 GB file comes down to syntax-ppss only, that means the low-handing fruit has been picked.

If that solves the problems in a reasonable way for very long lines,
maybe we will eventually have such an option.

Can I merge the branch, then?

Please wait until I have time to review it.

I was hoping for a stylistic review, perhaps. Like, whether you like the
name of the variable, and should it be split in two.

A change of the default value(s) is on the table too.

Will definitely do, I'm just busy with "other things" right now, most
of them related to other aspects of long lines.

Roger that.

One such major mode and one such file was presented long ago : a
single-line XML file.

XMl is indeed slower. It takes almost 3 seconds for me to scroll to the
end of a 20 MB XML file.

Most of it comes from sgml--syntax-propertize-ppss, which is probably
justified: XML is a more complex language.

Did you wait till nxml-mode did its initial scan and displayed "Valid"
in the mode line?  The performance is quite different before and after
that.

It takes a while to switch from "Validated: 0" to "Valid", but the performance seems about the same in both states.

Maybe some other example file would show different behavior, IDK.

But other than the initial delay, scrolling, and isearch, and local
editing, all work fast, unlike the original situation with JSON.

With which branch?

scratch/font_lock_large_files, with 'emacs -Q'

I've also run this test on master now, and M-> is not instant there either. Apparently, a fair amount time is also spent in nxml-extend-region (which calls sgml-syntax-propertize and syntax-ppss).

Not sure why it would spend any significant time in either, though, if they're called inside a narrowing.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]