bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question on the behavior of length()


From: arnold
Subject: Re: Question on the behavior of length()
Date: Mon, 11 Dec 2023 12:12:22 -0700
User-agent: Heirloom mailx 12.5 7/5/10

Hello.

Thanks for your note. This list is fine for discussion of issues
such as this.

There was recently some discussion and changes in the code and
doc related to typeof, mainly for untyped array elements. It starts here:
https://lists.gnu.org/archive/html/bug-gawk/2023-11/msg00012.html.

See also the node "Dynamic Typing" in the manual, but please look
at what's in Git, not at the 5.3.0 tarball.

Your proposed idiom for pushing onto a stack is interesting, but
it might also be taken to be a bit obtuse, and just from a programming
point of view, using push and pop functions is likely to lead to
clearer, more portable code.

With respect to your request that length() not force its argument
to be a scalar, but rather leave it as untyped, you are correct
in that it would likely work and existing awk and gawk code would
also work.

However, I'm loath to make such an exception for the standard built-in
functions, since it's a big break from existing practice and from
what all the other awks do.

Just for fun, I attempted to make the code change, but it didn't
allow your idiom to work.  Clearly there's more involved, which I
don't feel like investigating, as I don't see a big enough
benefit.

So, sorry, but I think I am not going to change length().

Arnold

Christian Schmidt <schmidt@digadd.de> wrote:

> Hi all,
>
> First of all I'd like to state that I do not consider the following 
> necessarily a bug; however I'd like to discuss the current 
> implementation, and have not found a better place to do so.
>
> My issue is that I can't use
>
> x[length(x)] = y
>
> e.g. to emulate to push to a stack, without explicitly converting x into 
> an array first, e.g. by "delete x".
>
> x will come into life untyped, and should be able to be used as an 
> array, however calling length() on it converts it to an unassigned scalar.
>
> There's two angles to this:
> a) posix, which defines length() as
> length[([s])]
> Return the length, in characters, of its argument taken as a string, or 
> of the whole record, $0, if there is no argument.
> b) GNU awk, which extends length() to be used to get the number of 
> elements in an array.
>
> I wonder if it would be more reasonable to just return 0 on length() 
> called with an untyped variable as argument, without modifying its input 
> (by converting it into a scalar).
>
> My rationale for this is:
> 1. Converting an untyped to a string will always create a zero-length 
> string, and so the result will be correct.
> 2. Converting an untyped to an array will create an empty, thus 
> zero-length, array (relevant for the GNU extension for length())
> 3. POSIX does not specify conversion of the argument, only "argument 
> taken as a string"
> 4. Using the (still untyped) variable later will have no change of 
> behavior for any scalar types, as the scalar type itself is not fixed, 
> except you now can still use it as an array
> 5. Generally, functions should IMHO not leave changes to their arguments 
> after leaving scope.
>
> The fact this happens can be observed as
>
> BEGIN {
>       print typeof(x)
>       print length(x)
>       print typeof(x)
>       exit
> }
>
> outputs
>
> untyped
> 0
> unassigned
>
> After reading the source I am actually not sure how/why this happens. My 
> guess is builtin.c line 609ff.:
>
>          if (tmp->type == Node_var_array) {
> [...]
>          } else if (tmp->type == Node_var_new || tmp->type == 
> Node_elem_new) {
>                  // this can happen from an indirect call
>                  DEREF(tmp);
>                  tmp = dupnode(Nnull_string);
>          }
>
> However I am not sure why this leaves changes outside scope.
>
> The assumption "this can happen from an indirect call" does not hold up, 
> it can definitely happen on purpose ;). I also am not familiar enough 
> with the codebase to understand what an indirect call in this context 
> is, and as such wary of just changing the else {} to return 0, as 
> internal callers (that I can't see, at least not calling do_length() 
> directly) might rely on the existing behavior.
>
> Any feedback?
>
> Best regards,
> Chris
>
> PS: please CC: me on replies, I am not subscribed to the list.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]