bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: The best way to convert space separated text to TSV?


From: Manuel Collado
Subject: Re: The best way to convert space separated text to TSV?
Date: Tue, 11 Feb 2020 10:48:54 +0100
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0

El 11/02/2020 a las 4:42, Peng Yu escribió:
Hi,

Many programs (such as wc and ps) print results in tables with one or
more spaces as separators. But the last column allows spaces in them.
To process the output of wc, I came up with the following code
(sometimes I need to manually change the display name such as
"file1"). But it is too verbose.

BEGIN {
        OFS = "\t"
        for(i=1;i<ARGC;++i) {
                fnames[i] = ARGV[i]
        }
        nfiles = ARGC - 1
        delete ARGV
}
{
        match($0, /^[ ]*/)
        line = substr($0, RSTART+RLENGTH)
        NF = 1
        for(i=1; i<=n; ++i) {
                if(match(line, /[ ]+/)) {
                        $i = substr(line, 1, RSTART-1)
                        line = substr(line, RSTART+RLENGTH)
                }
        }
        if(NR <= nfiles) {
                $i = fnames[NR]
        } else {
                if(line "") $i = line
        }
        print
}

$ awk -v n=2 -f ./wc.awk file1 <<EOF
  a bb c
aa  b c
EOF

$ awk -v n=3 -f ./wc.awk <<EOF
  a bb c
EOF

What is the most succinct way to convert such kind of input to TSV
format with gawk? Thanks.


Please try the following:

--- tabular1,awk ---
{
   nf = split($0, f, " ", s)
   offset = length(s[0])
   for (n=1; n<numcols; n++) {   # first numcols-1 columns
      offset = offset + length(f[n]) + length(s[n])
      printf("%s\t", f[n])
   }
   print substr($0, offset)      # last column, with spaces
}
--------------------

It can be invoked as:

wc xxxx | gawk -f tabular1.awk -v nomcols=4

Hope this helps.

--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado




reply via email to

[Prev in Thread] Current Thread [Next in Thread]