bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#25706: 26.0.50; Slow C file fontification


From: Mattias Engdegård
Subject: bug#25706: 26.0.50; Slow C file fontification
Date: Wed, 9 Dec 2020 18:00:30 +0100

First, some Emacs regexp basics:

1. If A and B match single characters, then A\|B should be written [AB] 
whenever possible. The reason is that A\|B adds a backtrack record which uses 
stack space and wastes time if matching fails later on. The cost can be quite 
noticeable, which we have seen.

2. Syntax-class constructs are usually better written as character alternatives 
when possible.

The \sX construct, for some X, is typically somewhat slower to match than 
explicitly listing the characters to match. For example, if all you care about 
are space and tab, then "\\s *" should be written "[ \t]*".

3. Unicode character classes are slower to match than ASCII-only ones. For 
example, [[:alpha:]] is slower than [A-Za-z], assuming only those characters 
are of interest.

4. [^...] will match \n unless included in the set. For example, "[^a]\\|$" 
will almost never match the $ (end-of-line) branch, because a newline will be 
matched by the first branch. The only exception is at the very end of the 
buffer if it is not newline-terminated, but that is rarely worth considering 
for source code.

5. \r (carriage return) normally doesn't appear in buffers even if the file 
uses DOS line endings. Line endings are converted into a single \n (newline) 
when the buffer is read. In particular, $ does NOT match at \r, only before \n.

When \r appears it is usually because the file contains a mixture of 
line-ending styles, typically from being edited using broken tools. Whether you 
want to take such files into account is a matter of judgement; most modes don't 
bother.

6. Capturing groups costs more than non-capturing groups, but you already know 
that.

On to specifics: here are annotations for possible improvements in cc-langs.el. 
(I didn't bother about capturing groups here.)

Attachment: cc-regexp-annot.diff
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]