bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Field separators in awk


From: Andrew J. Schorr
Subject: Re: [bug-gawk] Field separators in awk
Date: Tue, 31 Dec 2013 09:47:42 -0500
User-agent: Mutt/1.5.21 (2010-09-15)

Hi,

On Tue, Dec 31, 2013 at 07:40:28AM +0100, address@hidden wrote:
> However, it seems inefficient to call split() on $0 just to obtain
> the matched text. (Since the field splitting has already been done
> by awk).

I think that the $ variables are evaluated on a lazy basis.  In field.c,
there is this comment regarding the set_record function:

/*
 * set_record:
 * setup $0, but defer parsing rest of line until reference is made to $(>0)
 * or to NF.  At that point, parse only as much as necessary.
 *
 * Manage a private buffer for the contents of $0.  Doing so keeps us safe
 * if `getline var' decides to rearrange the contents of the IOBUF that
 * $0 might have been pointing into.  The cost is the copying of the buffer;
 * but better correct than fast.
 */

You can also look in field.c for the "parse_high_water" variable:
static long parse_high_water = 0; /* field number that we have parsed so far */

So I don't think there should be any performance hit unless the script tries
to access NF or fields greater than 0.

> I would propose either to add a new builtin variable, for instance
> it could be called "FT", that contains the separators, or if this is
> inefficient (for the common case, where that variable is not used by
> the user) to have an option (like I have proposed in a previous
> mail) to set FIELDWIDTHS="0" which should make awk skip the field
> splitting process all together, and just assign the whole line to
> $1, and set NF=1. Then it is up to the user to call split() on $0 if
> he likes to.

Also, what is wrong with saying something like:

   gawk '-F\n' '...'

That sets the field separator to the new line character (i.e. the same as RS).

bash-4.2$ awk '-F\n' 'BEGIN {print FS; print RS}' | od -c
0000000  \n  \n  \n  \n
0000004

bash-4.2$ echo 'this is a test' |  awk '-F\n' '{print NF; print $1}'
1
this is a test

Regards,
Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]