help-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: problem with GAWK/gsub when substitute new lines


From: Rita Shen
Subject: Re: problem with GAWK/gsub when substitute new lines
Date: Fri, 19 Jun 2009 10:29:05 +1000

Hi, Ralf,

Thanks for your reply.

I tried the command:

awk 'BEGIN { RS="\&"};{
    gsub(/\[[\n]+\]/, "");print $0}' gawk_test > gawk_modi

But the content in gawk_modi is still the same as the gawk_test:

<"Reports" [



]>


By the way, what's the difference between RS and FS?

Thanks for y our help,
Rita


On Thu, Jun 18, 2009 at 3:49 PM, Ralf Wildenhues <Ralf.Wildenhues@gmx.de> wrote:
Hello Rita,

* shaledova wrote on Thu, Jun 18, 2009 at 05:26:44AM CEST:
>
> I tried to use gawk to perform some text conversions. But I could not
> substitute new lines (\n) using gsub such as:
> gsub(/\[[\n]*\]/, "");
>
> For example, if I have a file containing:
> <"Week Report" [
>
>
>
> ]>
>
> I want to convert these lines to:
> <"Week Report">
>
> What is wrong with the _expression_?

The _expression_ is ok, but gawk operates on each line in turn by default;
more specifically, the implicit loop is over records, with RS being the
record separator, which is a newline by default.  With something like
 awk 'BEGIN { RS="X" }
      { gsub(/\[[\n]*\]/, ""); print }'

you can get the above input to turn into
 <"Week Report" >

(note also the space before the closing > that was noto matched).

Of course, this is a kludge and requires your input to not contain X;
and you might have to adjust the output record separator ORS as well.

However, when parsing nested structures, regular expressions are
generally not the right tool.  You might be better off writing a small
state machine that reads the file line by line and just skips printing
output when inside unwanted [ ] brackets.

Hope that helps.

Cheers,
Ralf


reply via email to

[Prev in Thread] Current Thread [Next in Thread]