sed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sed bug search or replace newline chars


From: Assaf Gordon
Subject: Re: sed bug search or replace newline chars
Date: Tue, 26 Jul 2016 00:14:41 -0400

(adding address@hidden mailing list)

Hello,

> On Jul 25, 2016, at 23:20, Bee <address@hidden> wrote:
> 
> lubuntu 16.04
> sed --version
> sed (GNU sed) 4.2.2
> 
> I have recurring files exported from a database with stray newline chars.  I 
> would like to remove them with sed but nothing is changed.
> 
> It works to use tr:
> cat xxx.txt | tr '\n' '\t' > yyy.txt
> 
> These hex and control codes work:
> sed -e 'y/\x0d/\x09/' xxx.txt > yyy.txt
> sed -e 'y/\r/\x09/' xxx.txt > yyy.txt
> sed -e 'y/\r/\t/' xxx.txt > yyy.txt
> 
> But these do nothing:
> sed -e 'y/\x0a/\x09/' xxx.txt > yyy.txt
> sed -e 'y/\n/\x09/' xxx.txt > yyy.txt
> 
> Is this a bug?
> 
> Bill Muench
> Santa Cruz, California

This is not a bug, but the expected behavior based on sed's inner working:
When sed reads a line, it first remove its newline character, then adds it to 
the pattern space.
y/// then operates on the pattern space (without the newline),
and finally sed prints the pattern space, and adds a newline at the end.

Thus, a command such as 'y/\n/\x09/' is not likely to do what you want.

In some cases, sed's pattern space can contain newlines: for example if the 'N' 
command is used to read the next line and append it into the pattern space. 
Compare:

    $ printf "abc\n" | sed 'y/\n/X/'
    abc

    $ printf "abc\ndef\n" | sed 'N;y/\n/X/'
    abcXdef

gnu sed supposed an option called "--null-data" (or "-z" for short), which 
treats NUL (ASCII 0x00) as line-terminators, making newlines a regular 
character. The following will demonstrate. However, care must be taken not to 
introduce NUL characters (which would likely cause more troubles):

    $ printf "abc\n" | sed -z 'y/\n/X/' 
    abcX

The 'tr' program, on the other hand, reads entire blocks of data from a file, 
ignoring lines or records. In tr's case, a newline has no special meaning 
besides being a byte whose value is 0x0a.

regards,
 - assaf



P.S.

1. Many database export programs support character escaping (e.g. if a database 
value contains a newline, it will not be printed as ASCII 0x0a to the output 
file). This might save you some troubles.

2. In sed's context, instead of "removing newline characters" it's better to 
think about the operation as "joining two lines". The following page contains 
many interesting examples of joining lines, with sed and other programs: 
http://stackoverflow.com/questions/7852132/sed-join-lines-together







reply via email to

[Prev in Thread] Current Thread [Next in Thread]