sed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

gnu sed's 'l' command behavior with -z (and without)


From: Assaf Gordon
Subject: gnu sed's 'l' command behavior with -z (and without)
Date: Sat, 6 Aug 2016 15:02:18 -0400

Hello,

(starting a new thread from previous discussion: 
http://lists.gnu.org/archive/html/sed-devel/2016-08/msg00000.html )

regarding this:

> On Aug 1, 2016, at 13:41, Jim Meyering <address@hidden> wrote:
> 
> On Sat, Jul 30, 2016 at 11:46 PM, Assaf Gordon <address@hidden> wrote:
>>  sed: adjust line-terminator of F/l/= commands when -z is used
> 
> In the second patch, this change
> 
>       if (width+olen >= line_len && line_len > 0) {
> -          ck_fwrite("\\\n", 1, 2, fp);
> +          ck_fwrite("\\", 1, 1, fp);
> +          ck_fwrite(&buffer_delimiter, 1, 1, fp);
> 
> appears to change from emitting backslash-NL-continued lines to
> backslash-NUL with -z. When using -z, do you still want to emit that
> backslash?
> Note that this is in code to honor sed's --line-length=N (-l) option,
> which one can argue is not relevant with -z.

I think we should output 'backslash-NUL' in such cases, unless we decide to 
make 'l' command output with '-z' mode ignore line-length limitation and never 
fold.

Without backslash-NUL for folded lines, the output will be inconsistent 
compared to regalur newline output.
For example, the following will not be equivalent:

    printf '%s\0' aaaaaaaa bbbbbbbb | ./sed/sed -nz 'N;l5' | tr '\000' '\n' | 
sed 's/\\000/\n/g'
    printf '%s\n' aaaaaaaa bbbbbbbb | ./sed/sed -n 'N;l5'

and vise-versa:

    printf '%s\n' aaaaaaaa bbbbbbbb | ./sed/sed -n 'N;l5' | tr '\n' '\000' | 
sed 's/\\n/\\000/g'
    printf '%s\0' aaaaaaaa bbbbbbbb | ./sed/sed -nz 'N;l5'

---

As a side note, 

It seems gnu sed's 'l' command output differs from FreeBSD/MacOS's sed in 
regards to embedded newlines.
Reading the POSIX standard, it's not clear to me which is correct (or perhaps 
both are correct). POSIX does not say that embedded newline should be converted 
to '\n'.

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html:

"(The letter ell.) Write the pattern space to standard output in a visually 
unambiguous form. The characters listed in XBD Escape Sequences and Associated 
Actions ( '\\', '\a', '\b', '\f', '\r', '\t', '\v' ) shall be written as the 
corresponding escape sequence; the '\n' in that table is not applicable. 
Non-printable characters not in that table shall be written as one three-digit 
octal number (with a preceding <backslash>) for each byte in the character 
(most significant byte first)."


In practical terms, it means gnu sed prints '$<NEWLINE>' at the end of the 
printed pattern,
while freebsd sed prints '$<NEWLINE>' at the end of every printed line.

The following will demonstrate:

     $ printf "aXa\n" aXa | freebsd-sed -n 'y/X/\n/;l'
     a$ 
     a$

     $ printf "%s\n" aXa | gnu-sed -n 'y/X/\n/;l'
     a\na$

     $ printf "%s\n" aaa bbb | freebsd-sed -n 'N;l'
     aaa$
     bbb$

     $ printf "%s\n" aaa bbb | gnu-sed -n 'N;l'
     aaa\nbbb$

Adding line-folding complicates matters:

    $ printf "%s\n" aXaaa | COLUMNS=3 freebsd-sed -n 'y/X/\n/;l'
    a$
    aa\
    a$

    $ printf "%s\n" aXaaa | gnu-sed -l3 -n 'y/X/\n/;l'
    a\
    \n\
    aa\
    a$

(gnu-sed ignores COLUMNS envvar, but provides '-l N' extension or 'lN' 
command-extension).

In freebsd-sed, there are only two options:
either 'backslash-<newline>' is printed, indicating line-folding,
or 'dollar-<newline>' is printed, indicated end-of-line.

gnu-sed adds a third option: 'backslash-<n>' indicates an embedded newline in 
the pattern.

That's another reason I'd like to keep printing 'backslash-NUL' with -z:
It makes the output consistent:
Either 'backslash-DELIMITER' or 'dollar-DELIMTER' or 
'backslash-ESCAPE-DELIMITER' (meaning '\n' or '\000') - regardless of what 
delimiter it is.

regards,
 - assaf

P.S.
This is obviously bike-shedding, as the '-z' option has been added in feb-2012 
(commit a08590648) and it doesn't seem anyone ever complained about -z with 'l'.
Still, an interesting exotic case...








reply via email to

[Prev in Thread] Current Thread [Next in Thread]