[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
gnu sed's 'l' command behavior with -z (and without)
From: |
Assaf Gordon |
Subject: |
gnu sed's 'l' command behavior with -z (and without) |
Date: |
Sat, 6 Aug 2016 15:02:18 -0400 |
Hello,
(starting a new thread from previous discussion:
http://lists.gnu.org/archive/html/sed-devel/2016-08/msg00000.html )
regarding this:
> On Aug 1, 2016, at 13:41, Jim Meyering <address@hidden> wrote:
>
> On Sat, Jul 30, 2016 at 11:46 PM, Assaf Gordon <address@hidden> wrote:
>> sed: adjust line-terminator of F/l/= commands when -z is used
>
> In the second patch, this change
>
> if (width+olen >= line_len && line_len > 0) {
> - ck_fwrite("\\\n", 1, 2, fp);
> + ck_fwrite("\\", 1, 1, fp);
> + ck_fwrite(&buffer_delimiter, 1, 1, fp);
>
> appears to change from emitting backslash-NL-continued lines to
> backslash-NUL with -z. When using -z, do you still want to emit that
> backslash?
> Note that this is in code to honor sed's --line-length=N (-l) option,
> which one can argue is not relevant with -z.
I think we should output 'backslash-NUL' in such cases, unless we decide to
make 'l' command output with '-z' mode ignore line-length limitation and never
fold.
Without backslash-NUL for folded lines, the output will be inconsistent
compared to regalur newline output.
For example, the following will not be equivalent:
printf '%s\0' aaaaaaaa bbbbbbbb | ./sed/sed -nz 'N;l5' | tr '\000' '\n' |
sed 's/\\000/\n/g'
printf '%s\n' aaaaaaaa bbbbbbbb | ./sed/sed -n 'N;l5'
and vise-versa:
printf '%s\n' aaaaaaaa bbbbbbbb | ./sed/sed -n 'N;l5' | tr '\n' '\000' |
sed 's/\\n/\\000/g'
printf '%s\0' aaaaaaaa bbbbbbbb | ./sed/sed -nz 'N;l5'
---
As a side note,
It seems gnu sed's 'l' command output differs from FreeBSD/MacOS's sed in
regards to embedded newlines.
Reading the POSIX standard, it's not clear to me which is correct (or perhaps
both are correct). POSIX does not say that embedded newline should be converted
to '\n'.
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html:
"(The letter ell.) Write the pattern space to standard output in a visually
unambiguous form. The characters listed in XBD Escape Sequences and Associated
Actions ( '\\', '\a', '\b', '\f', '\r', '\t', '\v' ) shall be written as the
corresponding escape sequence; the '\n' in that table is not applicable.
Non-printable characters not in that table shall be written as one three-digit
octal number (with a preceding <backslash>) for each byte in the character
(most significant byte first)."
In practical terms, it means gnu sed prints '$<NEWLINE>' at the end of the
printed pattern,
while freebsd sed prints '$<NEWLINE>' at the end of every printed line.
The following will demonstrate:
$ printf "aXa\n" aXa | freebsd-sed -n 'y/X/\n/;l'
a$
a$
$ printf "%s\n" aXa | gnu-sed -n 'y/X/\n/;l'
a\na$
$ printf "%s\n" aaa bbb | freebsd-sed -n 'N;l'
aaa$
bbb$
$ printf "%s\n" aaa bbb | gnu-sed -n 'N;l'
aaa\nbbb$
Adding line-folding complicates matters:
$ printf "%s\n" aXaaa | COLUMNS=3 freebsd-sed -n 'y/X/\n/;l'
a$
aa\
a$
$ printf "%s\n" aXaaa | gnu-sed -l3 -n 'y/X/\n/;l'
a\
\n\
aa\
a$
(gnu-sed ignores COLUMNS envvar, but provides '-l N' extension or 'lN'
command-extension).
In freebsd-sed, there are only two options:
either 'backslash-<newline>' is printed, indicating line-folding,
or 'dollar-<newline>' is printed, indicated end-of-line.
gnu-sed adds a third option: 'backslash-<n>' indicates an embedded newline in
the pattern.
That's another reason I'd like to keep printing 'backslash-NUL' with -z:
It makes the output consistent:
Either 'backslash-DELIMITER' or 'dollar-DELIMTER' or
'backslash-ESCAPE-DELIMITER' (meaning '\n' or '\000') - regardless of what
delimiter it is.
regards,
- assaf
P.S.
This is obviously bike-shedding, as the '-z' option has been added in feb-2012
(commit a08590648) and it doesn't seem anyone ever complained about -z with 'l'.
Still, an interesting exotic case...
- gnu sed's 'l' command behavior with -z (and without),
Assaf Gordon <=