[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] gawk4 split() function bug? or feature?
From: |
Aharon Robbins |
Subject: |
Re: [bug-gawk] gawk4 split() function bug? or feature? |
Date: |
Fri, 22 Mar 2013 10:25:20 +0200 |
User-agent: |
Heirloom mailx 12.5 6/20/10 |
Hi.
> From: Kent <address@hidden>
> Date: Tue, 19 Mar 2013 23:54:44 +0100
> To: address@hidden
> Subject: [bug-gawk] gawk4 split() function bug? or feature?
>
> Hi there,
>
> recently I found something strange about the split() function of gawk,
> I am not sure if it is a bug, it would be good if you guys could
> explain a bit. Thanks in advance.
It's a bit of a dark corner.
> my gawk version
> kent$ gawk --version
> GNU Awk 4.0.2
>
> I know that the 3rd parameter of split() function is a regex. but take
> a look these examples:
>
> kent$ echo "foo.bar.baz" | awk '{split($0,a,"."); print "length of
> a:" length(a);for (x in a) print a[x]}'
> length of a:3
> foo
> bar
> baz
The third argument can also be a string. It is then treated like
the value of FS, where if the value is a single character (even if that
character is a regex metacharacter) it acts as the separator. Once it
is longer than a single character, it is treated as a dynamic regexp.
> split() looks "." as literature "dot", same as /[.]/ or /\./
> but if I do:
> kent$ echo "foo.bar.baz" | awk '{split($0,a,/./); print "length of
> a:" length(a);for (x in a) print a[x]}'
> length of a:12
> (here we have 12 emplty lines)
Here, /./ is a regexp constant, so gawk knows unequivically that it should
treat the period as metacharacter. Other awks work this way also, not
just gawk.
It can be confusing, I admit. The language has more dark corners like this
than one would like, but that's the way it is. :-)
Hope this helps,
Arnold