bug#62983: workaround PCRE2 bug affecting at least \D and \W

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#62983: workaround PCRE2 bug affecting at least \D and \W

From:	Paul Eggert
Subject:	bug#62983: workaround PCRE2 bug affecting at least \D and \W
Date:	Fri, 21 Apr 2023 11:42:50 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0

On 2023-04-20 19:04, Carlo Marcelo Arenas Belón wrote:

All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug on
its JIT implementation that results in failure to match for the negative
perl classes, and seems to be easier to replicate when the matching
character is a multibyte one.

Unfortunately that is a little vague. I expect the issue is not limitedto \D and \W, as there are other ways to specify negative Perl classes.And if the bug merely seems to be easier to replicate with multibytecharacters, it sounds like we may have issues even when matching ASCIIcharacters in a UTF-8 locale.

Furthermore, I'm leery of optimizing for PCRE2 10.42 and earlier. Weshould focus our optimization efforts on future PCRE2 versions, and notworry about optimizing earlier versions where optimizations complicatemaintenance for a declining benefit, and are likely to provoke bugs inolder versions that as time passes will be harder to debug.

Alternatively JIT could be disabled instead, but the option selected has
less of an impact on performance.

Disabling JIT sounds better, as correctness trumps performance. Untilthe bug is fixed (or at least better-understood so that we have aworkaround we can trust), how about the attached patch instead?

0001-grep-use-PCRE2-JIT-only-in-unibyte-locales.patch
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

bug#62983: workaround PCRE2 bug affecting at least \D and \W, Carlo Marcelo Arenas Belón, 2023/04/20
- bug#62983: workaround PCRE2 bug affecting at least \D and \W, Jim Meyering, 2023/04/20
  - bug#62983: workaround PCRE2 bug affecting at least \D and \W, Jim Meyering, 2023/04/20
- bug#62983: workaround PCRE2 bug affecting at least \D and \W, Paul Eggert <=
  - bug#62983: workaround PCRE2 bug affecting at least \D and \W, Carlo Marcelo Arenas Belón, 2023/04/21
    - bug#62983: workaround PCRE2 bug affecting at least \D and \W, Jim Meyering, 2023/04/29
    - bug#62983: workaround PCRE2 bug affecting at least \D and \W, Carlo Arenas, 2023/04/29
    - bug#62983: workaround PCRE2 bug affecting at least \D and \W, Paul Eggert, 2023/04/29
    - bug#62983: workaround PCRE2 bug affecting at least \D and \W, Jim Meyering, 2023/04/30

Prev by Date: bug#62983: workaround PCRE2 bug affecting at least \D and \W
Next by Date: bug#62983: workaround PCRE2 bug affecting at least \D and \W
Previous by thread: bug#62983: workaround PCRE2 bug affecting at least \D and \W
Next by thread: bug#62983: workaround PCRE2 bug affecting at least \D and \W
Index(es):
- Date
- Thread