Re: article about gawk best practices in data science and feature propos

bug-gawk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: article about gawk best practices in data science and feature propos

From:	Manuel Collado
Subject:	Re: article about gawk best practices in data science and feature proposal
Date:	Thu, 11 Feb 2021 20:46:34 +0100
User-agent:	Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0

Ivan Molineris <ivan.molineris@gmail.com> wrote:
...

Moreover, one of the biggest drawbacks of gawk in our field is the fact
that, indicating the columns of the input by numbers often produces hard to
read scripts.
For this reason in the wrapper I commonly use it is possible to refer to
columns not only by number, but also by name.

For example, if a file is composed like this:

chromosome     start        end
       chr1       241      53521
       chr1       363      43623
       chr2      5243     234562

gawk '{l=$2-$1}'
can be also written as
gawk '{l=$end-$start}'

I know that this syntax is not back-compatible, maybe can be improved.

Do you know if someone has reasoned about a feature like this one in the
past?


The SYMTAB feature of gawk can be of help. Example:

$ cat headers.awk
# Assign column numbers to header named variables
FNR==1 {
    for (k=1; k<=NF; k++) {
        SYMTAB[$k] = k
    }
    next
}

# Process the data file
{
    print "Length of " $chromosome " is " $end - $start
}

$ cat data
chromosome     start        end
      chr1       241      53521
      chr1       363      43623
      chr2      5243     234562

$ gawk -f headers.awk data
Length of chr1 is 53280
Length of chr1 is 43260
Length of chr2 is 229319

HTH. Regards.
--
Manuel Collado - http://mcollado.z15.es

[Prev in Thread]

Current Thread

[Next in Thread]

article about gawk best practices in data science and feature proposal, Ivan Molineris, 2021/02/11
- Re: article about gawk best practices in data science and feature proposal, arnold, 2021/02/11
  - Re: article about gawk best practices in data science and feature proposal, david kerns, 2021/02/11
  - Re: article about gawk best practices in data science and feature proposal, Manuel Collado <=
    - Re: article about gawk best practices in data science and feature proposal, Andrew J. Schorr, 2021/02/11
    - Re: article about gawk best practices in data science and feature proposal, Manuel Collado, 2021/02/11
  - Re: article about gawk best practices in data science and feature proposal, Jean-Philippe Guérard, 2021/02/11
- Re: article about gawk best practices in data science and feature proposal, Andrew J. Schorr, 2021/02/11

Prev by Date: Re: article about gawk best practices in data science and feature proposal
Next by Date: Re: article about gawk best practices in data science and feature proposal
Previous by thread: Re: article about gawk best practices in data science and feature proposal
Next by thread: Re: article about gawk best practices in data science and feature proposal
Index(es):
- Date
- Thread