[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#30525: Unexpected matches for input data from a patch file

From: Assaf Gordon
Subject: bug#30525: Unexpected matches for input data from a patch file
Date: Thu, 1 Mar 2018 02:06:48 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

Hello Markus,

I believe there are actually several different issues here,
perhaps it's worth stating them explicitly to ensure we're
on the same page.

On 2018-03-01 12:52 AM, SF Markus Elfring wrote:
* Does the tool “grep” output any extra colour information also for
    matched tab characters?
grep's --color option was not used.

The matched characters are marked in red (for other search patterns)
on my test system even if this command parameter is not passed explicitly.
> There are further challenges to consider for special characters.
> Would it make sense to replace them by printable variants?


grep will print color information in the following situations:
1. If you use "--color=always"
2. If you use "--color=auto" and the output is a terminal
3. If you don't specify "--color" at all, and environment
variable GREP_OPTIONS is empty, and the output is a terminal
(then "--color=auto" is assumed).

Observe the following:

If you type this on the terminal, the letter "A" should be colored:

  $ printf "AB\n" | grep A

With this command, grep's output is a pipe (not a terminal),
and by default there will be no color:

  $ printf "AB\n" | grep A | cat

You can force color output with:

  $ printf "AB\n" | grep --color=always A | cat

And you can examine the color escape sequences with:

  $ printf "AB\n" | grep --color=always A | od -An -c
   033   [   0   1   ;   3   1   m 033   [   K   A 033   [   m 033
     [   K   B  \n

The characters "\033[01;31m" are the sequence to change color,
and "\033[m" is the sequence to reset the colors.
These are technically called "ansi terminal control escape sequences", more here: https://en.wikipedia.org/wiki/ANSI_escape_code .

Therefore, when discussing grep's coloring options
it's important to say if the output is a terminal or not,
and whether coloring is on or off (and when troubleshooting,
it is best to explicitly use --color=XXX).

When coloring is enabled (e.g. with "--color=always"), grep will
surround the TAB characters with the color escape sequences.
Observe the following:

  $ printf "\t\t\n" | grep -E --color=always '\s+' | od -An -c
   033   [   0   1   ;   3   1   m 033   [   K  \t  \t 033   [   m
   033   [   K  \n

Notice there is an ansi color escape sequence, followed by two tabs (\t), followed by "reset color" sequence, followed by "\n".

Grep's default coloring is red text and default background.
But TAB (and space) are empty characters - they do not have text.
Because the default background color is not changed, you will not
see them highlighted with a color.
You can change the default color with GREP_COLORS environment variables.

For example:

Here you should see "A" and "B" in color,
with empty (non-colored) spaces between them:

  $ printf "A \tB\n" | grep --color=always '.*'
  A       B

You can force the background color to be something else
like so:

  $ printf "A \tB\n" | GREP_COLORS="mt=41" grep --color=always '.*'
  A       B

The above command should print "A" and "B" with red background
and default text color (See the next item regarding the white-space colors). To learn more about using GREP_COLORS, read "man grep".

This is an unexpected "gotcha" - some terminal programs
do NOT color tab characters at all! they just move the cursor,
while others print multiple spaces which are colored.

(by terminal programs I mean "xterm" or "gnome-terminal" or "konsole",
and this is also affected by using tmux or screen, etc.).

To test this, try the following on several terminals:

  printf "\033[41m A B\tC\n\033[m"

This sequences means:
background red color, space, A, space, B, tab, C, new line, reset-color.

On gnome-terminal, I get the entire line in red background.
On xterm, I get black space between B and C - meaning TAB is just moving the cursor. You might see different results on your terminal - which means "grep" is not the problem at all here.

Fifth and last,
You asked about replacing non-printable characters.
This is easy enough to do with existing programs,
so not likely to be added as a new option to grep.

If you just want to replace TAB characters:

  printf "\t\tif\n" | grep --color=always -E '^\s+[^i]' | cat -T
  printf "\t\tif\n" | grep --color=always -E '^\s+[^i]' | tr '\t' x

To replace other characters, add more characters to 'tr', or something more complicated, use sed:

  printf "\t\tif\n" | grep --color=always -E '^\s+[^i]' \
                          | sed 's/\t/<TAB>/g ; s/ /<SPACE>/g'

Hope this helps,
 - assaf

reply via email to

[Prev in Thread] Current Thread [Next in Thread]