help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Insertion of extra OFS character into output string


From: H
Subject: Re: Insertion of extra OFS character into output string
Date: Tue, 14 Mar 2023 01:59:16 +0100
User-agent: K-9 Mail for Android

On March 14, 2023 12:41:16 AM GMT+01:00, "Neil R. Ormos" 
<ormos-gnulists17@ormos.org> wrote:
>H wrote:
>
>> I am a newcomer to awk and have run into an
>> issue I have not figured out yet... My platform
>> is CentOS 7 running awk 4.0.2, the default
>> version.
>
>> The following awk statement generates an extra
>> tab character between fields 1 and 2, regardless
>> of the data in the file:
>
>> awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1;
>gsub(/"/, ""); print}' somefile.csv
>
>> If i change the statement to:
>
>> awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$2=$2;
>gsub(/"/, ""); print}' somefile.csv
>
>> an extra OFS character is inserted between
>> fields two and three. I can add that removing
>> the gsub() in either of the two examples does
>> not affect the results.
>
>> Might this be a bug in 4.0.2 or a feature I have
>> not yet understood?
>
>I don't have 4.0.2 available to test, but I tested with older and newer
>versions.
>
>When I test, I get the result I think I expect from the code you
>posted.
>
>Also, setting FPAT overrides the effect of having earlier set FS.  (I
>believe that the most-recently set one among FS, FPAT, and FIELDWIDTHS
>controls the field splitting operation.)
>
>echo "1,2" | awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"}
>{$1=$1; print}' | hexdump -c
>0000000   1  \t   2  \n
>0000004
>
>It would be easier to help if you would please provide:
>
>  the simplest input line that reproduces the problem;
>
>  the output you expect; and
>
>  the output you are getting.

I am not on my computer but typing this on my phone. With that caveat, a 
/minimal/ example would be:
echo "Alpha,Beta,Charlie,Delta" | awk 'BEGIN{FS=","; 
FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}'

I would expect to see:
Alpha<TAB>Beta<TAB>Charlie<TAB>Delta
but instead see
Alpha<TAB><TAB>Beta<TAB>Charlie<TAB>Delta

If you change $1=$1 to $2=$2 you will find that the extra tab character then 
moves to the next field.

I believe I had also tried without the definition of FS with the same result.

Finally, note that the FPAT expression comes from the awk documentation and is 
thus expected to work.

Can anyone try this with the most recent version of awk?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]