bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Why strings extracted by match() can be considered as num


From: arnold
Subject: Re: [bug-gawk] Why strings extracted by match() can be considered as numbers?
Date: Mon, 11 Jun 2018 12:13:33 -0600
User-agent: Heirloom mailx 12.4 7/29/08

Andy's answer is on target.

Thanks,

Arnold

"Andrew J. Schorr" <address@hidden> wrote:

> Hi,
>
> On Mon, Jun 11, 2018 at 12:40:26PM -0500, Peng Yu wrote:
> > The following example shows that strings extracted by match() can be
> > considered as numbers. This automatic conversion is not natural to me.
> > 
> > $ cat main.sh
> > #!/usr/bin/env bash
> > # vim: set noexpandtab tabstop=2:
> > 
> > set -v
> > seq 10 | awk -e 'BEGIN { match("8", /([0-9]+)/, a) } $1 > a[1] { print }'
> > seq 10 | awk -e 'BEGIN { a[1] = "8" } $1 > a[1] { print }'
> > $ ./main.sh
> > seq 10 | awk -e 'BEGIN { match("8", /([0-9]+)/, a) } $1 > a[1] { print }'
> > 9
> > 10
> > seq 10 | awk -e 'BEGIN { a[1] = "8" } $1 > a[1] { print }'
> > 9
> > 
> > Based on the manpage, it seems that the results should only be
> > considered as strings. Why there is such a discrepancy? Thanks.
> > 
> >        match(s, r [, a])       Return the position in s where the
> > regular expression r  occurs,  or  zero  if  r  is  not
> >                                present,  and  set  the values of
> > RSTART and RLENGTH.  Note that the argument order is the
> >                                same as for the ~ operator: str ~ re.
> > If array a is provided, a is cleared and then  ele-
> >                                ments 1 through n are filled with the
> > portions of s that match the corresponding parenthe-
> >                                sized subexpression in r.  The zero'th
> > element of a contains the portion of s  matched  by
> >                                the entire regular expression r.
> > Subscripts a[n, "start"], and a[n, "length"] provide the
> >                                starting index in the string and length
> > respectively, of each matching substring.
>
> This is documented in the info docs:
>
> https://www.gnu.org/software/gawk/manual/html_node/Variable-Typing.html#Variable-Typing
>
>    Fields, getline input, FILENAME, ARGV elements, ENVIRON elements, and the
>    elements of an array created by match(), split(), and patsplit() that are
>    numeric strings have the strnum attribute.34 Otherwise, they have the 
> string
>    attribute. Uninitialized variables also have the strnum attribute.
>
> There is only so much that can fit in the man page. See also the POSIX awk 
> spec
> discussion of "numeric string" values:
>
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
>
>    A string value shall be considered a numeric string if it comes from one 
> of the following:
>       1. Field variables
>       2. Input from the getline() function
>       3. FILENAME
>       4. ARGV array elements
>       5. ENVIRON array elements
>       6. Array elements created by the split() function
>       7. A command line variable assignment
>       8. Variable assignment from another numeric string variable
>
> The match function is a gawk extension, and the array values parsed using
> match are treated the same way as those parsed using split. I hope you will
> agree that this makes sense.
>
> If you want to force a string value, you can concatenate with "":
>
> bash-4.2$ seq 10 | awk -e 'BEGIN { match("8", /([0-9]+)/, a) } $1 > (a[1] "") 
> { print }'
> 9
>
> Regards,
> Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]