bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#33763: RE backtrack for last slash fails when backslashblank involve


From: Assaf Gordon
Subject: bug#33763: RE backtrack for last slash fails when backslashblank involved
Date: Sun, 16 Dec 2018 13:49:52 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0

tags 33763 notabug
close 33763
stop

Hello,

On 2018-12-15 3:07 p.m., Peter Benjamin wrote:
Backtrack last slash RE does not work when there are "\ " involved.

RE:
sed -e 's/^\(.*\)\/\([^\/]*\)$/\2\t\1\/\2/' findm

$ cat findm
/media/userid/data/movies/movie\ 1\ a.m4v
/media/userid/data/movies/movie\ 1\ a.extra.m4v
/media/userid/data/movies/movie\ 2.m4v
/media/userid/data/movies/movie\ 3.m4v
/media/userid/data/movies/movie4.m4v
/media/userid/data2/movies/data.m4v

STDOUT

$ sed -e 's/^\(.*\)\/\([^\/]*\)$/\2\t\1\/\2/' findm
/media/userid/data/movies/movie\ 1\ a.m4v
/media/userid/data/movies/movie\ 1\ a.extra.m4v
/media/userid/data/movies/movie\ 2.m4v
/media/userid/data/movies/movie\ 3.m4v
movie4.m4v      /media/userid/data/movies/movie4.m4v
data.m4v        /media/userid/data2/movies/data.m4v

------------------------

Same backtrack last slash RE in perl works:

perl -n -e 'chomp;s/^(.*)\/([^\/]*)$/\2\t\1\/\2/;print"$_\n"' findm

STDOUT
movie\ 1\ a.m4v /media/userid/data/movies/movie\ 1\ a.m4v
movie\ 1\ a.extra.m4v   /media/userid/data/movies/movie\ 1\
a.extra.m4v
movie\ 2.m4v    /media/userid/data/movies/movie\ 2.m4v
movie\ 3.m4v    /media/userid/data/movies/movie\ 3.m4v
movie4.m4v      /media/userid/data/movies/movie4.m4v
data.m4v        /media/userid/data2/movies/data.m4v


Thank you for providing such clear and reproducible examples -
it makes the troubleshooting much easier.

First,
let's enable sed's extended regular expression syntax (by adding "-E"),
to make the comparison simpler.
The following "sed -E" command is equivalent to the one you used above,
and produces the same (unsatisfying) results:

       sed -E -e 's/^(.*)\/([^\/]*)$/\2\t\1\/\2/'             findm
perl -n -e 'chomp;s/^(.*)\/([^\/]*)$/\2\t\1\/\2/;print"$_\n"' findm

Now,
The culprit lies in the bracket expression:
   [^\/]

The POSIX definition of regular expression bracket expression says:

  "The special characters '.', '*', '[', and '\' (period, asterisk,
  left-bracket, and backslash, respectively) shall lose their special
  meaning within a bracket expression."

(from section 9.3.5 subitem 1, last sentence in the paragraph:
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_03_05 )

Meaning, the bracket expression "[^\/]" is not "every character except
regular slash" (with the slash character escaped by backslash).
Instead It means "every character except slash or backslash".
Since the first four file names contain backslash, the regex does not
match them.

If the backslash is removed, the results are as you expected:

  $ sed -E -e 's/^(.*)\/([^/]*)$/\2\t\1\/\2/' findm
  movie\ 1\ a.m4v /media/userid/data/movies/movie\ 1\ a.m4v
  movie\ 1\ a.extra.m4v   /media/userid/data/movies/movie\ 1\   a.extra.m4v
  movie\ 2.m4v    /media/userid/data/movies/movie\ 2.m4v
  movie\ 3.m4v    /media/userid/data/movies/movie\ 3.m4v
  movie4.m4v      /media/userid/data/movies/movie4.m4v
  data.m4v        /media/userid/data2/movies/data.m4v

As such, I conclude that it is not a sed bug.
Perhaps Perl's parsing requires to escape the slash,
which leads to this apparent differences.

I'm closing this as "not a bug",
but discussion can continue by replying to this thread.


regards,
 - assaf






reply via email to

[Prev in Thread] Current Thread [Next in Thread]