bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#62983: workaround PCRE2 bug affecting at least \D and \W


From: Jim Meyering
Subject: bug#62983: workaround PCRE2 bug affecting at least \D and \W
Date: Sat, 29 Apr 2023 08:54:44 +0200

On Fri, Apr 21, 2023 at 10:22 PM Carlo Marcelo Arenas Belón
<carenas@gmail.com> wrote:
> On Fri, Apr 21, 2023 at 11:42:50AM -0700, Paul Eggert wrote:
> > On 2023-04-20 19:04, Carlo Marcelo Arenas Belón wrote:
> > > All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug on
> > > its JIT implementation that results in failure to match for the negative
> > > perl classes, and seems to be easier to replicate when the matching
> > > character is a multibyte one.
> >
> > Unfortunately that is a little vague. I expect the issue is not limited to
> > \D and \W, as there are other ways to specify negative Perl classes.
>
> Correct, it should also affect at least \S, but hadn't been able to trigger
> it there.
>
> The bug was that an uninitialized value was being used in the JIT code that
> supports the PCRE2_MATCH_INVALID_UTF mode. which is why I said "randomly" in
> the commit message.
>
> If you want to be strict, how about the attached patch instead?
>
> > And if
> > the bug merely seems to be easier to replicate with multibyte characters, it
> > sounds like we may have issues even when matching ASCII characters in a
> > UTF-8 locale.
>
> Which the current workaround addresses, since you need both PCRE2_JIT and
> PCRE2_MATCH_INVALID_UTF to trigger it, and the subject encoding is irrelevant
> for the logic to decide if PCRE2_MATCH_INVALID_UTF gets enabled or not.
>
> > Furthermore, I'm leery of optimizing for PCRE2 10.42 and earlier. We should
> > focus our optimization efforts on future PCRE2 versions, and not worry about
> > optimizing earlier versions where optimizations complicate maintenance for a
> > declining benefit, and are likely to provoke bugs in older versions that as
> > time passes will be harder to debug.
>
> Not sure I understand your concern here, but if it is about disabling JIT
> insteed, then the possibility of introducing bugs is even bigger since it
> affects all versions of PCRE2 (not only 10.34 or newer).
>
> > > Alternatively JIT could be disabled instead, but the option selected has
> > > less of an impact on performance.
> >
> > Disabling JIT sounds better, as correctness trumps performance. Until the
> > bug is fixed (or at least better-understood so that we have a workaround we
> > can trust), how about the attached patch instead?
>
> The bug has been fixed already, and will be included in the next release.
> There might be additional changes as spelled in that discussion, and indeed
> the change to the proposed solution proactively helps with one of those.
>
> It is very unlikely, but some systems might include non 0 values on the
> tables for characters over 127 and that might trigger a similar problem that
> is yet to be fixed.
>
> Carlo
>
> [1] 
> https://github.com/PCRE2Project/pcre2/commit/2c08b619dc973beacc474dcb67cda8cd366200ce

Thanks, Carlo.
I've made some small adjustments and tidied up the ChangeLog in the attached.
Hope to push it by Sunday.

There's enough going on via gnulib that I'll likely make yet another
snapshot with the very latest.

Also, there remain solaris sparc and i386 gnulib test failures:

    
https://buildfarm.opencsw.org/buildbot/builders/ggrep-solaris10-sparc/builds/336
      FAIL: test-c-stack.sh
      FAIL: test-year2038

    
https://buildfarm.opencsw.org/buildbot/builders/ggrep-solaris10-i386/builds/334
      FAIL: test-year2038

Attachment: grep-pcre2.diff
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]