bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Having an alias and a function with the same name leads to some sort


From: Robert Elz
Subject: Re: Having an alias and a function with the same name leads to some sort of recursion
Date: Wed, 08 Feb 2023 05:33:35 +0700

    Date:        Tue, 7 Feb 2023 14:35:54 -0500
    From:        Chet Ramey <chet.ramey@case.edu>
    Message-ID:  <abae9b19-76a9-fe75-51df-a7623b3d147b@case.edu>

  | On 2/7/23 12:33 PM, Dale R. Worley wrote:  (That was 7 Feb, not 2 July...)
  | > That makes it clear why the second case behaves as it does.  But my
  | > reading of the definition of "simple commands" implies that function
  | > defintions are not simple commands,

You're right, they're not.

  | > and alias substitution should not be
  | > done on them (that is, the initial part) in any case.

That's what the standard said (says) currently, but it was never how shells
behaved (the standard was wrong), alias processing is a lexical issue, and
as Chet says:

  | When you parse a command and perform alias expansion, you don't yet know if
  | you're reading a simple command or a function definition.

which is why in the forthcoming edition, POSIX has been changed from:

        After a token has been delimited, but before applying the
        grammatical rules in Section 2.10, a resulting word that is
        identified to be the command name word of a simple command shall
        be examined to determine whether it is an unquoted, valid alias name.

into:

        After a token has been categorized as type TOKEN (see Section 2.10.1),
        including (recursively) any token resulting from an alias substitution,
        the TOKEN shall be subject to alias substitution if:

            � the TOKEN does not contain any quoting characters,
[...]
            � the TOKEN could be parsed as the command name word of a simple
              command (see Section 2.10), based on this TOKEN and the tokens
              (if any) that preceded it, but ignoring whether any subsequent
              characters would allow that,

(There are more rules in both cases, but they're not currently relevant).

In the example case

        cmd() { echo "$@" ; }

where "cmd" has been defined as an alias, when the lexical analysis phase
has read 'c' 'm' 'd' (and combined those into the token "cmd" (a "word",
which posix calls TOKEN in 2.10.1) the next char is '(' which delimits the
current token (ends it), but that char is not processed yet, it will be part
of the following token.   The just completed token, categorised as a TOKEN,
(section 2.10.1) contains no quoting chars, (and rules not stated above: is
a valid and defined alias name, not currently being expanded) so is replaced
by the value of the alias.

It needs to be this way, as the shell allows (always has) things like

        alias thing='echo('

and then with that in effect, one can write

        thing) { printf 'hello\n'; }

(where I purposely didn't create a recursive function, so you can test it).

That expands, after alias processing to:

        echo() { printf 'hello\n'; }

but if we only substituted aliases in what are actually command words of
simple commands, there would have been no alias substitution there, as

        thing) { ....

isn't a simple command (without any alias being expanded yet), it isn't
a function definition either, it is simply a syntax error - there is no
opening '(' to match the closing ')'.

With the new rules, that is not an issue - just as it never was for shells.

Of course all of this is absurd, impossible to explain to anyone who doesn't
already understand the difference between lexical analysis and parsing,
or what either of those have to do with running shell commands, and just
provides another reason that aliases should be abandoned completely.

It also justifies the current bash manual page remaining as it is, even though
it is not technically correct - it is close enough for people who want (for
unexplainable reasons) to use aliases to work out how to use them in normal
cases.

kre

ps: even as rewritten, the standard is not perfect, as if the input were

        $x thing) ...
or
        $x cmd() ...

then if $x != '' then there's certainly no alias processing of thing or
cmd, as the first word of the value of $x would be the command word, and
what follows that command's args (including "thing" or "cmd" - in either
case the parentheses will cause a syntax error, unless some non-standard
shell syntax permits them).

But if $x == '' then, since the expansion is unquoted, it simply vanishes,
in which case "thing" (or "cmd") would be in the command word position,
so strictly, given the above input, "thing" or "cmd" *could* be a TOKEN in
the command word position (we don't know, as the lexer doesn't evaluate $x),
so according to the new wording, should be alias expanded (assuming the
word is defined as an alias).   But that's not how shells work either.
The lexical analysis phase just sees $x as one word, delimited by white
space (which the lexer deletes) followed by "thing" (or "cmd") as another
word - the 2nd word in a sequence of words isn't in the command word position,
so is never considered for alias processing.

Note that even if it wanted to, the lexer cannot expand $x to find out what
it will be, as we might be in a loop like

        for x in cmd1 '' cmd2; do
                $x thing ...
        done

which isn't executed until the entire thing is parsed, which means the
lexical analysis is finished - in that example, in 2 executions of 3 in
the loop, $x provides the command word, the third time (2nd time around
the loop of the three) $x vanishes, and "thing" is the command word.
But the lexer only processes that command line once - it cannot produce
two different outputs, nor does it have any idea that the tokens it is
generating are going inside a loop (that kind of feedback from the parser
is not available - depending upon the nature of the parser, it might not
even have decided that there is a loop at the time the lexical analysis of
the 2nd line above is done - that might wait for the 3rd line to complete
the needed syntax elements).





reply via email to

[Prev in Thread] Current Thread [Next in Thread]