[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#47264: [PATCH v2] pcre: migrate to pcre2
From: |
Carlo Arenas |
Subject: |
bug#47264: [PATCH v2] pcre: migrate to pcre2 |
Date: |
Sun, 14 Nov 2021 14:25:42 -0800 |
On Sun, Nov 14, 2021 at 12:45 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
>
> On 11/9/21 02:58, Carlo Marcelo Arenas Belón wrote:
> > Sadly, hadn't been able to generate a release,
>
> Does this mean you're having trouble running 'make dist'? If so, what's
> the trouble?
I seem to be unlucky; getting certificate errors in Debian sid, FTBFS
errors when building the info in macOS, but the latest master was able
to run `make dist` successfully in Debian 10, so it is just likely a
PBKAC problem.
> Also, I followed up with several related patches (also attached as
> 0002-0012). Please take a look at them and let us know of any problems.
> In the attached patch "grep: prefer signed integers" I followed the
> usual grep approach of preferring signed to unsigned integers (e.g.,
> idx_t to size_t) when either will do; this lets us debug better with
> -fsanitize=undefined to catch integer overflow.
the one in patch6 where a uint32_t option is doubled, triggers
warnings because of comparing an unsigned variable with 0 AFAIK, but
there are several of those in the upstream gnulib so presumably not a
concern?
using idx_t instead of size_t should be fine (if only halves the max
size of the objects managed), but I am concerned that assuming
PCRE2_SIZE_MAX is always equivalent to SIZE_MAX (as done in patch 4)
might be risky (at least without a comment), and considering that is
part of the API anyway might be better if kept as PCRE2_SIZE_MAX IMHO.
> One issue I discovered: PCRE2_EXTRA_MATCH_WORD (which is used by
> pcre2grep -w) is incompatible with 'grep -w'. For example, 'echo a%%a |
> grep -Pw %%' outputs nothing, whereas 'echo a%%a | pcre2grep -w %%'
> outputs 'a%%a'. I think the GNU grep behavior (which is the same as with
> 'grep -w', either on Linux or OpenBSD) is more intuitive here: do you
> happen to know why PCRE behaves the way it does? Is that worth a PCRE2
> bug report? Anyway, the attached patches avoid using
> PCRE2_EXTRA_MATCH_WORD for that reason.
As I mentioned before, PCRE matches the Perl definition as mentioned
before in an early draft that also had this change reversed.
I would suggest instead that -P should also follow perl convention
instead when used together with -w, but maybe that is something that a
-P feature flag could enable or disable as needed?
Note that "word" definition also has a different meaning in a post
Unicode world, and so I expect that will have to change eventually as
well.
Carlo
- bug#47264: [PATCH v2] pcre: migrate to pcre2, Carlo Marcelo Arenas Belón, 2021/11/09
- bug#47264: [PATCH v2] pcre: migrate to pcre2, Paul Eggert, 2021/11/14
- bug#47264: [PATCH v2] pcre: migrate to pcre2, Jeffrey Walton, 2021/11/14
- bug#47264: [PATCH v2] pcre: migrate to pcre2, Paul Eggert, 2021/11/14
- bug#47264: [PATCH v2] pcre: migrate to pcre2, Carlo Arenas, 2021/11/14
- bug#47264: [PATCH v2] pcre: migrate to pcre2, Paul Eggert, 2021/11/15
- bug#47264: [PATCH v2] pcre: migrate to pcre2, Carlo Marcelo Arenas Belón, 2021/11/15
- bug#47264: [PATCH v2] pcre: migrate to pcre2, Paul Eggert, 2021/11/15