bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#40242: n as delimiter alias


From: Assaf Gordon
Subject: bug#40242: n as delimiter alias
Date: Mon, 30 Mar 2020 22:42:09 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0

tags 40242 confirmed
stop

Hello,

On 2020-03-25 11:30 p.m., Oğuz wrote:
While '\t' matches a literal 't' when 't' is the delimiter, '\n' does not
match 'n' when 'n' is the delimiter. See:

$ echo t | sed 'st\ttt' | xxd
00000000: 0a                                       .
$
$ echo n | sed 'sn\nnn' | xxd
00000000: 6e0a

Is this a bug or is there a sound logic behind this?

Thank you for finding this interesting edge-case.

I think it is a (very old) bug. I'm not sure about its origin,
perhaps Jim or Paolo can comment.

First,
let's start with what's expected (slightly modifying your examples):

The canonical usage, here "\t" becomes a TAB, and "t" is not replaced:

   $ printf t | sed 's/\t//' | od -a -An
      t

Then, using a different character "q" instead of "/", works the same:

   $ printf t | sed 'sq\tqq' | od -a -An
      t

The sed manual says (in section "3.3 The s command"):
      "
      The / characters may be uniformly replaced by any other single
      character within any given s command.

      The / character (or whatever other character is used in its
      stead) can appear in the regexp or replacement only if it is
      preceded by a \ character.
      "

This is the reason "\t" represents a regular "t" (not TAB)
*if* the substitute command's delimiter is "t" as well:

      $ printf t | sed 'st\ttt' | od -a -An
      [no output, as expected]

And similarly for other characters:

      printf x | sed 'sx\xxx' | od -a -An
      printf a | sed 'sa\aaa' | od -a -An
      printf z | sed 'sz\zzz' | od -a -An
      [no output, as expected]

---

Second,
The "\n" case behaves differently, regardless of which
separator is used. It is always treated as "\n" (new line),
never literal "n", even if the separator is "n":

These are correct, as expected:
    $ printf n | sed 's/\n//' | od -a -An
       n
    $ printf n | sed 's/\n//' | od -a -An
       n
    $ printf n | sed 'sx\nxx' | od -a -An
       n

Here, we'd expect "\n" to be treated as a literal "n" character,
not "\n", but it is not (as you've found):

    $ printf n | sed 'sn\nnn' | od -a -An
       n

----

In the code, the "match_slash" function [1] is used to find
the delimiters of the "s" command (typically "slashes").
Special handling happens if a slash is found [2],
And in lines 557-8 there's this conditional:

              else if (ch == 'n' && regex)
                ch = '\n';

Which forces any "\n" to be a new-line, regardless if the
delimiter itself was an "n".

[1] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n531
[2] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n552

In older sed versions, these two lines where protected by
"#ifndef REG_PERL" [3] so perhaps it had something to do with regex variants. But the origin of this line predates the git history.
Jim/Paolo - any ideas what this relates to?

https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c?id=41a169a9a14b5bdc736313eb411f02bcbe1c046d#n551

---

Interestingly, removing these two lines does not cause
any test failures, so this might be easy to fix without causing
any regressions.


For now I'm leaving this item open until we decide how to deal with it.

regards,
 - assaf









reply via email to

[Prev in Thread] Current Thread [Next in Thread]