[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
script to convert separators for CSV processing
From: |
Ed Morton |
Subject: |
script to convert separators for CSV processing |
Date: |
Sat, 11 Nov 2023 10:54:48 -0600 |
User-agent: |
Mozilla Thunderbird |
The new `--csv` processing mode is great but since it doesn't handle
chars other than commas as the separator, I expect many people will want
to know how to convert their TSV, `;`-separated, `|`-separated, etc.
files to/from `,`-separated so they can use the new functionality and so
here's a suggestion of a script that you could include in the
documentation to convert string-separated input into CSV (or other
string-separated) output without reading all of the input into memory at
once for input files that otherwise follow CSV quoting/separator rules,
etc. so that multiple people don't have to try to figure it out:
-------
|$ cat changeSeps.awk BEGIN { FS = OFS = "\"" if ( (old == "") || (new
== "") ) { printf "Error: old=\047%s\047 and/or new=\047%s\047 separator
string missing.\n", old, new ||>"/dev/stderr"||printf "Usage: awk -v old=\047;\047 -v new=\047,\047 -f changeSeps.awk
infile [> outfile]\n" ||>"/dev/stderr"||err = 1 exit } sanitized_old = old sanitized_new = new # Ensure all
regexp and replacement chars get treated as literal
gsub(/[^^\\]/,"[&]",sanitized_old) # regexp: char other than ^ or \ ->
[char] gsub(/\\/,"\\\\",sanitized_old) # regexp: \ -> \\
gsub(/\^/,"\\^",sanitized_old) # regexp: ^ -> \^
gsub(/[&]/,"\\\\&",sanitized_new) # replacement: & -> \\& } { $0 = prev
ors $0 prev = $0 ors = ORS } NF%2 { for ( i=1; i<=NF; i+=2 ) { cnt +=
gsub(sanitized_old,sanitized_new,$i) } print prev = ors = "" } END { if
( !err ) { printf "Converted %d \047%s\047 field separators to
\047%s\047s.\n", cnt+0, old, new >"/dev/stderr" } exit err }|
-------
You'd call it as:
-----
awk -v old='<old separator>' -v new='<new separator>' -f changeSeps.awk file
-----
e.g. to convert TSV to CSV:
-----
$ printf '"foo\tbar"\tetc\n'
"foo bar" etc
$ printf '"foo\tbar"\tetc\n' | awk -v old='\t' -v new=',' -f changeSeps.awk
"foo bar",etc
Converted 1 ' ' field separators to ','s.
-----
Regards,
Ed.
- script to convert separators for CSV processing,
Ed Morton <=