bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

script to convert separators for CSV processing


From: Ed Morton
Subject: script to convert separators for CSV processing
Date: Sat, 11 Nov 2023 10:54:48 -0600
User-agent: Mozilla Thunderbird

The new `--csv` processing mode is great but since it doesn't handle chars other than commas as the separator, I expect many people will want to know how to convert their TSV, `;`-separated, `|`-separated, etc. files to/from `,`-separated so they can use the new functionality and so here's a suggestion of a script that you could include in the documentation to convert string-separated input into CSV (or other string-separated) output without reading all of the input into memory at once for input files that otherwise follow CSV quoting/separator rules, etc. so that multiple people don't have to try to figure it out:

-------

|$ cat changeSeps.awk BEGIN { FS = OFS = "\"" if ( (old == "") || (new == "") ) { printf "Error: old=\047%s\047 and/or new=\047%s\047 separator string missing.\n", old, new ||>"/dev/stderr"||printf "Usage: awk -v old=\047;\047 -v new=\047,\047 -f changeSeps.awk infile [> outfile]\n" ||>"/dev/stderr"||err = 1 exit } sanitized_old = old sanitized_new = new # Ensure all regexp and replacement chars get treated as literal gsub(/[^^\\]/,"[&]",sanitized_old) # regexp: char other than ^ or \ -> [char] gsub(/\\/,"\\\\",sanitized_old) # regexp: \ -> \\ gsub(/\^/,"\\^",sanitized_old) # regexp: ^ -> \^ gsub(/[&]/,"\\\\&",sanitized_new) # replacement: & -> \\& } { $0 = prev ors $0 prev = $0 ors = ORS } NF%2 { for ( i=1; i<=NF; i+=2 ) { cnt += gsub(sanitized_old,sanitized_new,$i) } print prev = ors = "" } END { if ( !err ) { printf "Converted %d \047%s\047 field separators to \047%s\047s.\n", cnt+0, old, new >"/dev/stderr" } exit err }|

-------

You'd call it as:
-----
awk -v old='<old separator>' -v new='<new separator>' -f changeSeps.awk file
-----

e.g. to convert TSV to CSV:

-----
$ printf '"foo\tbar"\tetc\n'
"foo    bar"    etc

$ printf '"foo\tbar"\tetc\n' | awk -v old='\t' -v new=',' -f changeSeps.awk
"foo    bar",etc
Converted 1 '   ' field separators to ','s.
-----

Regards,

    Ed.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]