|
From: | Dmitry Gutov |
Subject: | bug#56682: Fix the long lines font locking related slowdowns |
Date: | Sun, 14 Aug 2022 23:46:13 +0300 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 |
On 14.08.2022 20:59, Eli Zaretskii wrote:
Date: Sun, 14 Aug 2022 20:47:40 +0300 Cc: 56682@debbugs.gnu.org, gregory@heytings.org, monnier@iro.umontreal.ca From: Dmitry Gutov <dgutov@yandex.ru>The better way is to acknowledge that some inaccuracies are acceptable in those cases. With that in mind, one can design a syntax analyzer that looks back only a short ways, until it finds some place that could reasonably serve as an anchor point for heuristic decisions about whether we are inside or outside a string or comment, and then verifying that guess with some telltale syntactic elements that follow (like semi-colons or comment-end delimiters in C). While this kind of heuristics can sometimes fail, if they only fail rarely, the result is a huge win.You cannot design a language-agnostic syntax analyzer like that._I_ cannot, but hopefully someone else will.
That seems unlikely. Nothing's impossible, of course, but I wouldn't want to wait for such an invention to come up before we make the decision on how to proceed now.
What _can_ be done is make syntax-ppss's cache invalidations more local by introducing a "repair" step. That would only speed up certain operations, at most, and the initial wait near EOB can't be avoided this way.
In any case, the way to speed up these cases is to look at the profile and identify the code that is slowing us down; then attempt to make it faster. (20 sec is actually long enough for us to interrupt Emacs under a debugger and look at the backtrace to find the culprit.)I've profiled and benchmarked this scenario already: all of the delay (17 seconds, to be precise) come from parse-partial-sexp. 1 GB is a lot.Before we get to 1GB files, there are 20MB files and 250MB files. I found quite a few low-hanging fruit in those that are worth plucking, while we wait for parse-partial-sexp to get its act together.
Definitely.But when the profiler output in a 1 GB file comes down to syntax-ppss only, that means the low-handing fruit has been picked.
If that solves the problems in a reasonable way for very long lines, maybe we will eventually have such an option.Can I merge the branch, then?Please wait until I have time to review it.I was hoping for a stylistic review, perhaps. Like, whether you like the name of the variable, and should it be split in two. A change of the default value(s) is on the table too.Will definitely do, I'm just busy with "other things" right now, most of them related to other aspects of long lines.
Roger that.
One such major mode and one such file was presented long ago : a single-line XML file.XMl is indeed slower. It takes almost 3 seconds for me to scroll to the end of a 20 MB XML file. Most of it comes from sgml--syntax-propertize-ppss, which is probably justified: XML is a more complex language.Did you wait till nxml-mode did its initial scan and displayed "Valid" in the mode line? The performance is quite different before and after that.
It takes a while to switch from "Validated: 0" to "Valid", but the performance seems about the same in both states.
Maybe some other example file would show different behavior, IDK.
But other than the initial delay, scrolling, and isearch, and local editing, all work fast, unlike the original situation with JSON.With which branch?
scratch/font_lock_large_files, with 'emacs -Q'I've also run this test on master now, and M-> is not instant there either. Apparently, a fair amount time is also spent in nxml-extend-region (which calls sgml-syntax-propertize and syntax-ppss).
Not sure why it would spend any significant time in either, though, if they're called inside a narrowing.
[Prev in Thread] | Current Thread | [Next in Thread] |