bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#60690: -P '\d' in GNU and git grep


From: Carlo Arenas
Subject: bug#60690: -P '\d' in GNU and git grep
Date: Fri, 7 Apr 2023 22:01:14 -0700

On Fri, Apr 7, 2023 at 12:00 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
>
> On 2023-04-06 06:39, demerphq wrote:
>
> > Unicode specifies that \d match any digit
> > in any script that it supports.
>
> "Specifies" is too strong. The Unicode Regular Expressions technical
> standard (UTS#18) mentions \d only in Annex C[1], next to the word
> "digit" in a column labeled "Property" (even though \d is really syntax
> not a property). This is at best an informal recommendation, not a
> requirement, as UTS#18 0.2[2] says that UTS#18's syntax is only for
> illustration and that although it's similar to Perl's, the two syntax
> forms may not be exactly the same. So we can't look to UTS#18 for a
> definitive way out of the \d mess, as the Unicode folks specifically
> delegated matters to us.
>
> Even ignoring the \d issue the digit situation is messy. UTS#18 Annex C
> says "\p{gc=Decimal_Number}" is the standard recommended syntax
> assignment for digits. However, PCRE2 does not support this syntax; it
> supports another variant \p{Nd} that UTS#18 also recommends. So it
> appears that PCRE2 already does not implement every recommended aspect
> of UTS#18 syntax. PCRE2 also doesn't match Perl, which does support
> "\p{gc=Decimal_Number}".

Not sure I follow the whole logic here, but PCRE2[3] (search for
"general category" which is what the "gc" above stands for) only
supports the abbreviated form of the unicode classes and `Nd` is
indeed the one that corresponds to `Decimal_Number`.

Carlo

[1]: https://unicode.org/reports/tr18/#Compatibility_Properties
[2]: https://unicode.org/reports/tr18/#Conformance
[3]: https://pcre2project.github.io/pcre2/doc/html/pcre2pattern.html





reply via email to

[Prev in Thread] Current Thread [Next in Thread]