bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unclosed quotes on heredoc mode


From: Robert Elz
Subject: Re: Unclosed quotes on heredoc mode
Date: Wed, 24 Nov 2021 22:40:05 +0700

    Date:        Tue, 23 Nov 2021 11:09:51 -0500
    From:        Chet Ramey <chet.ramey@case.edu>
    Message-ID:  <3a5f6f3a-aa73-d8ac-46f4-46467d5b398d@case.edu>

  | > I'll run our tests against the newest (released) bash
  |
  | OK. However, since, as I said, the devel branch has a completely different
  | implementation, this is not particularly useful.

OK, then I won't bother ... running the tests is easy (about 5 mins
elapsed time, after about 1 second setup) but analysing the results to
separate out the real bugs from the places where bash is just different
from our shell, and neither is right or wrong (our tests are testing to
make sure we don't change things by accident) takes longer.

  | It's the build version: how many times have you built in this build tree?

Once, of course ... why would I ever build it again?

  | I get into the hundreds before I recycle it.

Of course, when you're doing development that tends to happen, it does
for me as well - but I don't do bash development, just use it (interactively
only, because I was csh trained, and my fingers still type !! and !$ all
the time).

  | Whatever. You do you. Don't be surprised if many of my answers turn out to
  | be "that's already fixed in the devel branch."

First, thanks to those (several) people who indicated how I could
fetch the devel code, I might look at that sometime, but in general I
prefer to wait for the released versions (and then for those to get
included in NetBSD's pkgsrc, which usually happens quite quickly).

Wasting effort (mine and yours) isn't a goal, so I can either try
(probably just once) a devel version via a tarball, or wait for the
devel version to turn into a release, and run the tests then.

  | Refer to my previous message about the reading-full-lines strategy.

I have no problem with reading full lines, but whenever a "full line"
includes a newline token, any pending here docs should be read.  As soon
as you see the << the grammar should be telling the lexer to look for
heredoc data as soon as (probably actually just prior to) returning a
newline token to the grammer (the before/after doesn't really matter,
doing it before just saves needing to remember that the newline was read
just before .. in either case the newline char has been read and consumed,
turned into a newline token, which will be returned to the grammar, and the
next grammar related input will be after the heredoc data is done, so reading
the heredoc(s) and then returning the newline token is slightly simpler).


  | The devel branch produces
  |
  | TRACE: pid 78934: parse_comsub: need_here_doc = 1 after yyparse()?
  | cat: abc: No such file or directory
  | cat: def: No such file or directory

That looks much better.

  | We talked about this. The command substitution starts a new parsing context
  | to implement the "any valid shell script" part of the standard.

Then we get to whether heredoc data is part of a valid shell script
in that sense - when there is yet to be a newline token to introduce it.

This is where we started this, the question of which newline is the one
after which heredoc data starts.   It isn't at all as clear as you make
it appear to be.

  | The netbsd shell appears to be the outlier here. The parser reads the
  | command substitution so it can parse the entire and-or list before trying
  | to gather any here-documents.

You cannot possibly really mean that I hope.   That is, in

        cmd1 <<EOF &&
        data
        EOF
                cmd2

you do agree that "data" is stdin to cmd1, that is, the herdoc data
appears splat in the middle of the and-or list.   That's certainly the
way it appears to work (in bash) to me.

Once again, heredoc gathering has nothing at all to do with the grammar,
and so obviously not the parser either, beyond it informing the lexer which
heredocs are pending.

  | If you want the text of the here-document to apply to the command
  | substitution, put it inside the command substitution.

That might be goo advice to a script writer, and that will certainly
work with our shell, but it really should not be necessary, the heredoc
data is simply pending until there's a newline token, just as (aside from
the missing "token") the standard says.

  | Otherwise, you violate the "any valid shell script" clause

No, you don't, the command substitution is grammatically correct.
Perfectly valid.   heredocs come after newline tokens, if there is
no newline token, the heredoc cannot appear - and nothing, anywhere
says that it must (but as just above, the script writer can insert
a newline token, and thus the heredoc data as well, if they prefer).

  | and the behavior varies there.

It does, there are buggy shells.

  | The fundamental point of disagreement is what to do if the lexer (after,
  | presumably, calling the parser recursively) finds that it still has here-
  | documents to read after reading the end of the command substitution.

That seems like it, but the same issue could apply to other gramattical
contexts, like subshells, which really should be self contained as well,
if you approach it from that point of view.

  | What happens in the command substitution stays in the command substitution.
  | If you subscribe to that,

I don't.   Or not as your intending that to mean here.

  | > And then there is of course the combination of the two of those examples:
  | > 
  | > cat <<EOF && grep xyx $( cat <<END
  |
  | Which has the same fundamental disagreement.

Except in this case an answer is vital, as there's no way to
reconcile this by careful script writing (other than by not
ever writing something like this - inserting extra newline tokens
just to avoid the issue).   If one simply subscribes to the standard's
"next newline [token]" (which is as basic as that, no qualifications)
then if you put a newline after that END, then that (which would be
a newline token) is the next after the <<EOR heredoc, and as that one
is first on the line, its data must appear before the <<END heredoc(,

  | The logical conclusion of this line of thinking is that a `done' in a
  | command substitution can terminate a `for' loop that starts outside it.

Nonsense.   Those are all grammatical constructs, and the grammar governs
what is possible.   heredocs are lexical magic, and totally unlike everything
else in the shell.

kre




reply via email to

[Prev in Thread] Current Thread [Next in Thread]