bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#47264: [PATCH v2] pcre: migrate to pcre2


From: Carlo Arenas
Subject: bug#47264: [PATCH v2] pcre: migrate to pcre2
Date: Sun, 14 Nov 2021 14:25:42 -0800

On Sun, Nov 14, 2021 at 12:45 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
>
> On 11/9/21 02:58, Carlo Marcelo Arenas Belón wrote:
> > Sadly, hadn't been able to generate a release,
>
> Does this mean you're having trouble running 'make dist'? If so, what's
> the trouble?

I seem to be unlucky; getting certificate errors in Debian sid, FTBFS
errors when building the info in macOS, but the latest master was able
to run `make dist` successfully in Debian 10, so it is just likely a
PBKAC problem.

> Also, I followed up with several related patches (also attached as
> 0002-0012). Please take a look at them and let us know of any problems.
> In the attached patch "grep: prefer signed integers" I followed the
> usual grep approach of preferring signed to unsigned integers (e.g.,
> idx_t to size_t) when either will do; this lets us debug better with
> -fsanitize=undefined to catch integer overflow.

the one in patch6 where a uint32_t option is doubled, triggers
warnings because of comparing an unsigned variable with 0 AFAIK, but
there are several of those in the upstream gnulib so presumably not a
concern?

using idx_t instead of size_t should be fine (if only halves the max
size of the objects managed), but I am concerned that assuming
PCRE2_SIZE_MAX is always equivalent to SIZE_MAX (as done in patch 4)
might be risky (at least without a comment), and considering that is
part of the API anyway might be better if kept as PCRE2_SIZE_MAX IMHO.

> One issue I discovered: PCRE2_EXTRA_MATCH_WORD (which is used by
> pcre2grep -w) is incompatible with 'grep -w'. For example, 'echo a%%a |
> grep -Pw %%' outputs nothing, whereas 'echo a%%a | pcre2grep -w %%'
> outputs 'a%%a'. I think the GNU grep behavior (which is the same as with
> 'grep -w', either on Linux or OpenBSD) is more intuitive here: do you
> happen to know why PCRE behaves the way it does? Is that worth a PCRE2
> bug report? Anyway, the attached patches avoid using
> PCRE2_EXTRA_MATCH_WORD for that reason.

As I mentioned before, PCRE matches the Perl definition as mentioned
before in an early draft that also had this change reversed.
I would suggest instead that -P should also follow perl convention
instead when used together with -w, but maybe that is something that a
-P feature flag could enable or disable as needed?

Note that "word" definition also has a different meaning in a post
Unicode world, and so I expect that will have to change eventually as
well.

Carlo





reply via email to

[Prev in Thread] Current Thread [Next in Thread]