bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#25750: [sed] Matching square brackets


From: Bob Proulx
Subject: bug#25750: [sed] Matching square brackets
Date: Thu, 16 Feb 2017 02:16:58 -0700
User-agent: NeoMutt/20170113 (1.7.2)

林自均 wrote:
> I want to remove the square brackets in a string:
> 
> $ echo '[1,2,3]' | sed 's/\[//g' | sed 's/\]//g'
> 1,2,3
> 
> And it works.

Yes.  But the above isn't strictly correct regular expression usage.
Let's discuss it piece by piece.

  echo '[1,2,3]' |

Okay.  Good test pattern.

  sed 's/\[//g' |

Okay.  Since the [ would start a character class and you want it to
match itself it needs to be escaped.

  sed 's/\]//g'

This is not strictly correct.  You have escaped the ] with \].  But
that is not needed.  The ] does not do anything special in that
context.  It ends a character class started by a [ but outside of that
it is simply a normal character.  Escaping the \] defaults to being
just a ] character.  But it is a bad habit to get into because
escaping other characters such as \+ turns on ERE handling.  Your
expressoin should be this following instead.

  sed 's/]//g'

Those two could be combined into one sed command.

  echo '[1,2,3]' | sed -e 's/\[//g' -e 's/]//g'
    1,2,3

Or by a combined string split by the ';' separator.

  echo '[1,2,3]' | sed 's/\[//g;s/]//g'
    1,2,3

I tend to prefer the latter.  But either is fine.

> However, when I want to do it in a single sed, it does not work:
> 
> $ echo '[1,2,3]' | sed 's/[\[\]]//g'
> [1,2,3]

That is incorrect usage.  Do not escape characters inside of [...]
character classes.  The above is behaving correctly.  But do not
escape characters inside of [...] character classes.

You are starting a character class to match any of the enclosed
characters.  That is good.  But then it is broken by escaping the
characters inside the character class.  Do not escape them.  Inside of
a character class there is nothing special about those characters
because the class turns off special characters.  Therefore trying to
escape them is wrong.  That is the problem.

Please review the documentation on regular expressions here:

  
https://www.gnu.org/software/sed/manual/html_node/Character-Classes-and-Bracket-Expressions.html#Character-Classes-and-Bracket-Expressions

  Most meta-characters lose their special meaning inside bracket expressions:

  ']'  ends the bracket expression if it’s not the first list
       item. So, if you want to make the ‘]’ character a list item,
       you must put it first.

Therefore you must start the character class, then immediately put in
the ] to match itself literally.  It does not end the character class
since an empty class wouldn't make sense.

  [  -- start of the character class
  ]  -- match a literal ]
  [  -- match a literal [
  ]  -- end the class

Here is the working example:

  echo '[1,2,3]' | sed 's/[][]//g'
    1,2,3

> I can manage to make it work by a weird regexp:
> 
> $ echo '[1,2,3]' | sed 's/[]\[]//g'
> 1,2,3

That is also incorrect usage.  You have added an additional \ into the
class.  You thought you were esaping the [ but since it is inside of a
bracket character class expression already the \ was simply a normal
character and matched itself.

  echo '[1,2,3]\1\2\3'
  [1,2,3]\1\2\3
  echo '[1,2,3]\1\2\3' | sed 's/[]\[]//g'
  1,2,3123
  echo '[1,2,3]\1\2\3' | sed 's/[][]//g'
  1,2,3\1\2\3

As you can see including the \ also removed the \ characters too.
Because \ was included as part of the character class.

> Is that a bug? If it is, I would like to spend some time to fix it.

It is not a bug.  It is incorrect usage.  I will close the ticket.
But please let us know if this makes sense to you.  Feel free to
continue the discussion.

Bob





reply via email to

[Prev in Thread] Current Thread [Next in Thread]