emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questionable code in handling of wordend in the regexp engine in reg


From: Stefan Monnier
Subject: Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c
Date: Mon, 25 Feb 2019 14:18:10 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)

>> > Surely the argument to the second occurrence should be (charpos + 1)?
>> I believe it's instead the other one that needs to use "charpos - 1"
>> because the UPDATE_SYNTAX_TABLE is called just before reading the char
>> *before* charpos (see patch below).
> I don't think this is right.  offset is calculated from d, and then
> decremented, before calculating charpos.

Hmm... I think you're right (and the symend code does like you suggest).
This said, I find it odd that the code does:

              ptrdiff_t offset = PTR_TO_OFFSET (d) - 1;
              ptrdiff_t charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
              UPDATE_SYNTAX_TABLE (charpos);

Supposedly `d` is a char* pointing to the beginning of a potentially
multibyte char, In that case `d - 1` will point "somewhere before the
end of the previous multibyte char" but not necessarily at
its beginning.  Maybe the patch below would be preferable to avoid
this situation?

Worse, in notwordbound we do:

                ptrdiff_t offset = PTR_TO_OFFSET (d - 1);
                ptrdiff_t charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
                UPDATE_SYNTAX_TABLE (charpos);

which seems even more broken because `d` might point to the first byte
after the gap, so `d - 1` will point in the middle of the gap, so it's
simply an invalid argument to PTR_TO_OFFSET.  According to the
definition of PTR_TO_OFFSET and POINTER_TO_OFFSET, the result may
be the same as if we did the decrement after the fact, but it still
looks fishy.  WDYT?


        Stefan


diff --git a/src/regex-emacs.c b/src/regex-emacs.c
index b667a43a37..b21cba0e46 100644
--- a/src/regex-emacs.c
+++ b/src/regex-emacs.c
@@ -4811,9 +4811,9 @@ re_match_2_internal (struct re_pattern_buffer *bufp, 
re_char *string1,
              int c1, c2;
              int s1, s2;
              int dummy;
-             ptrdiff_t offset = PTR_TO_OFFSET (d) - 1;
+             ptrdiff_t offset = PTR_TO_OFFSET (d);
              ptrdiff_t charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
-             UPDATE_SYNTAX_TABLE (charpos);
+             UPDATE_SYNTAX_TABLE (charpos - 1);
              GET_CHAR_BEFORE_2 (c1, d, string1, end1, string2, end2);
              s1 = SYNTAX (c1);
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]