[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#62267: grep-3.9 bug: \d matches multibyte digits

From: Paul Eggert
Subject: bug#62267: grep-3.9 bug: \d matches multibyte digits
Date: Sun, 19 Mar 2023 01:28:38 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0

On 2023-03-18 23:33, Jim Meyering wrote:
By the way, have you ever used \D? I think I have not.

No, I'm not much of a Perl user these days (last seriously used it in the 1990s...).

-  char *new_keys = xnmalloc (len / 2 + 1, 5);
+  char *new_keys = xnmalloc (len / 2 + 1, 6);

This could be xnmalloc (len + 1, 3).

Or if you want to show the work, you can replace it with something like:

   int origlen = sizeof "\\D" - 1;
   int repllen = sizeof "[^0-9]" - 1;
   int expansion = repllen / origlen + (repllen % origlen != 0);
   char *new_keys = xnmalloc (len + 1, expansion);

(Isn't memory allocation fun? :-)

Doesn't Perl have the same issue?

Oh, you're right. Not being a Perl expert, all I did was run this:

  echo '٠١٢٣٤٥٦٧٨٩' | perl -ne 'print if /\d/'

and I observed no output. However, I now see that I need to use perl's -C option too, to get the kind of regular-expression behavior that plain grep has.

Looking at the source code again, how about if we move the PCRE-specific changes from src/grep.c to src/pcresearch.c which is where it really belongs, and more importantly use the bleeding-edge PCRE2_EXTRA_ASCII_BSD macro if available?

Something like the attached patch, say. This patch doesn't take your \D fixes (or the above suggestions) into account.

Attachment: 0001-grep-forward-port-to-PCRE2-10.43.patch
Description: Text Data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]