[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Insertion of extra OFS character into output string
From: |
H |
Subject: |
Re: Insertion of extra OFS character into output string |
Date: |
Tue, 14 Mar 2023 15:12:02 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
On 03/14/2023 02:37 AM, Andrew J. Schorr wrote:
> On Mon, Mar 13, 2023 at 09:10:28PM +0100, H wrote:
>> I am a newcomer to awk and have run into an issue I have not figured out
>> yet... My platform is CentOS 7 running awk 4.0.2, the default version.
>>
>> The following awk statement generates an extra tab character between fields
>> 1 and 2, regardless of the data in the file:
>>
>> awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/,
>> ""); print}' somefile.csv
>>
>> If i change the statement to:
>>
>> awk 'BEGIN{FS=","; FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$2=$2; gsub(/"/,
>> ""); print}' somefile.csv
>>
>> an extra OFS character is inserted between fields two and three. I can add
>> that removing the gsub() in either of the two examples does not affect the
>> results.
>>
>> Might this be a bug in 4.0.2 or a feature I have not yet understood?
> I think it is in fact a bug in 4.0.2:
>
> bash-5.1$ ./gawk --version | head -1
> GNU Awk 4.0.2
>
> bash-5.1$ echo "Alpha,Beta,Charlie,Delta" | ./gawk 'BEGIN{FS=",";
> FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}' | od -c
> 0000000 A l p h a \t \t B e t a \t C h a r
> 0000020 l i e \t D e l t a \n
> 0000032
>
> I confirmed that the CentOS 7 gawk has this bug.
>
> Compare to the current master branch:
>
> bash-5.1$ echo "Alpha,Beta,Charlie,Delta" | ./gawk 'BEGIN{FS=",";
> FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}' | od -c
> 0000000 A l p h a \t B e t a \t C h a r l
> 0000020 i e \t D e l t a \n
> 0000031
>
> I think you have 3 options:
> 1. Install a newer version of gawk on your system.
> 2. Open a bug on Red Hat bugzilla and wait for them to patch it.
> 3. Upgrade to Rocky 8 or Rocky 9. :-)
> I checked, and it's fixed in gawk 4.2.1 in Rocky 8:
>
> bash-4.4$ ./usr/bin/gawk --version | head -1
> GNU Awk 4.2.1, API: 2.0 (GNU MPFR 3.1.6-p2, GNU MP 6.1.2)
>
> bash-4.4$ echo "Alpha,Beta,Charlie,Delta" | ./usr/bin/gawk 'BEGIN{FS=",";
> FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}' | od -c
> 0000000 A l p h a \t B e t a \t C h a r l
> 0000020 i e \t D e l t a \n
> 0000031
>
> Regards,
> Andy
Thank you for researching this. This machine is not slated to be upgraded at
this time. Is there a newer version of awk for CentOS 7 available somewhere
else?