bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#41004: Documentation:enhancement - search for hexvalue


From: Jim Meyering
Subject: bug#41004: Documentation:enhancement - search for hexvalue
Date: Tue, 12 May 2020 20:19:05 -0700

On Sun, May 10, 2020 at 10:00 AM Stephane Chazelas
<address@hidden> wrote:
>
> 2020-05-01 19:05:28 +0200, address@hidden:
> [...]
> > problem: grep for a character where only the hexcode in known.
> >
> > solution:        use $'\xNN'
> >                      then shell expands this to the required code
> >
> > example:       printf "A\nB\nC\n" | grep $'\x41'
> [...]
>
> The $'\x41' ksh93 quoting operator expands to *byte* values.
>
> To get a character based on the Unicode codepoint value, you'd
> need the $'\u41' zsh operator (or $'\U10000' for code points
> above 0xffff).
>
> But in any case, that is done by the shell, that has nothing to
> do with grep and the syntax of those shell operators varies
> between shells.
>
> In the fish shell you'd use:
>
> grep \u41
>
> or
>
> grep \x41
>
> instead.
>
> Also, since it's done by the shell, things like:
>
> grep $'\u2e'
>
> where U+002E is "FULL STOP", would not only match on "."
> characters but on any character. All grep sees is a "."
> character. That would be different from grep -P '\x2e' which
> matches "." (U+002E) only.
>
> Note that:
>
> grep -P '\xE9'
>
> matches on the byte 0xE9 in singlebyte locales (regardless of
> what character that byte represents in the locale's charset) and
> on character U+00E9 in UTF-8 locales (so the 0xc3 0xa9 sequence
> of bytes, not byte 0xe9).

Thank you for the thorough reply, Stephane!
Bearing that in mind, Radisson, please consider submitting a revised patch.
I suggest to recommend something like this:

$ printf '%s\n' A B C| LC_ALL=C grep -P '\x41'
A

so that the example is independent of both the current locale and the shell.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]