--- Begin Message ---
Subject: |
replace-regexp missing some matches |
Date: |
Mon, 18 Feb 2019 08:28:35 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 |
Reproduce:
- Start "emacs -Q" and open the file BitmapFontFace.h
- Evaluate the expression (replace-regexp "\\<Bitmap\\>" "SharedBitmap")
- The text "Replaced 8 occurrences" appears in the echo area.
Problem:
There were actually 12 occurrences (ie. of the word "Bitmap" surrounded
by word boundaries) in the file that should have been replaced. If I now
move point back to the start of the buffer and evaluate the expression
again, it says "Replaced 4 occurrences".
The exact number of incorrect replacements perhaps varies over time.
That is, I can test it five times in a row and get 8 initial replacments
each time, but after trying some other search terms, messing with the
file, restarting Emacs etc, I try my initial test again and then maybe
it consistently replaces 10 the first time, for a while. So your exact
numbers may vary.
I debugged the Lisp as far as I could and it appears to be wrong answers
coming out of the re-search-forward C call that is in
isearch-search-fun-default.
The bug filters up to a number of string replacement user actions - I
first noticed it when trying to do this replacement interactively with
query-replace on word boundaries (C-u M-%), entering "Bitmap" as search
string, then "SharedBitmap" as replacement string. Trying now, as I
press space repeatedly about once a second to confirm each one, I see
the pink highlight skip valid matches to ask me about one that is
further down even while I see the skipped one highlighted in blue a few
lines above, and in the end it may have replaced only 6-8 of the
occurrences. Though, if I press 'n' instead of space to skip without
making any replacements, it does visit all of the occurrences.
I see from the Lisp that plain (non-regexp) query-replace on word
boundaries gets preprocessed into the equivalent regexp search as in my
initial example. I don't think there are any problems with plain string
search and replacement.
Some more experimental observations:
- The replacement text can be any string instead of "SharedBitmap", eg.
"qwertyasdfgh", "qwer", etc, and the bug still happens. The number of
matches seems to be related to the length of the replacement string.
Currently 12 character replacement strings are causing replace-regexp to
make 8 replacements on the first call for me, while 4 character strings
cause 7 replacements. 6 character replacement strings - ie. same length
as "Bitmap" - always work, replacing all 12 occurrences.
- The bug doesn't happen in fundamental-mode, nor c-mode, js-mode,
text-mode or any other major modes I tried.
- I've seen this happen in other of my C++ files where I was making the
same replacement, so the problem's not precisely unique to this one.
I've been trying to simplify this one but haven't found anything much
more revealing so far. For example if I delete all the comments and
blank lines, then the first replacement finds 9 occurrences out of 10.
If I cut the file in half by deleting line 140 onwards, the first
replacement finds 3 occurrences out of 6. But if I do something very
simple like just pasting "Bitmap<PixelType>" on 100 consecutive lines,
it's not fooled and it replaces them all.
I've tried this in GNU Emacs 26.1 on Arch Linux and 25.2.1 on Windows 7
and am seeing the same behaviour in both.
Thanks,
Daniel
BitmapFontFace.h
Description: Text Data
--- End Message ---