[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#29871: 25.3; ZWJ word-boundaries in regexps
From: |
Stefan Kangas |
Subject: |
bug#29871: 25.3; ZWJ word-boundaries in regexps |
Date: |
Sun, 29 Sep 2019 01:28:02 +0200 |
tags 29871 + notabug
close 29871
quit
Eli Zaretskii <eliz@gnu.org> writes:
>> From: "Mark Shoulson" <mark@nagas.meson.org>
>> Date: Wed, 27 Dec 2017 14:07:40 -0500
>>
>> According to http://unicode.org/reports/tr29/#Word_Boundaries rule WB4,
>> it would seem that a ZWJ character (U+200D ZERO WIDTH JOINER) between
>> two "word" characters should not constitute a word boundary. And yet:
>>
>> (string-match "\\<" "foo\u200Dfbar" 1)
>>
>> evaluates to 4 (the 1 is to skip the word-beginning at the start of the
>> string). Or you can search for "\\b" or "\\>" and get 3. Either way,
>> indicative of a word-break at the ZWJ character. Is this correct?
>
> Emacs considers a change of script as a word break, and U+200D's
> script is 'symbol', which is different from 'latin', the script of the
> ASCII characters.
According to the above explananation, this behaviour is expected. I'm
therefore closing this as notabug.
Best regards,
Stefan Kangas
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- bug#29871: 25.3; ZWJ word-boundaries in regexps,
Stefan Kangas <=