bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "here strings" and tmpfiles


From: Robert Elz
Subject: Re: "here strings" and tmpfiles
Date: Tue, 09 Apr 2019 14:32:38 +0700

    Date:        Mon, 8 Apr 2019 23:36:39 -0700
    From:        pepa65 <solusos@passchier.net>
    Message-ID:  <c8b1d17c-13a5-3d74-e9ef-958e238b47f4@passchier.net>

  | When in the past I proposed this syntax:
  |      cmd >>>var
  | the idea was to commit the output of a command into memory (in the form
  | of a variable), without requiring a pipe or file.

In general that cannot work, cmd and the shell are in separate
processes, even if some form of shared memory were used, cmd would
somehow have to be taught how to do that - but only to do it when
that particular form of output is being used (which since redirects
are handled by the shell, it normally knows nothing about).

Of course if cmd is built into the shell, then it would be easy,
but inventing new syntax which only works in very special cases is
not a good idea.

The idea is basically just to do

        var=$( cmd )

right?   But without a fork.   That's something that can be done today,
no new syntax needed (bash might even do it sometimes, I don't know, the
FreeBSD shell does.)

When cmd is not built in, then the shell simply forks, after making
a pipe, and it works as you'd expect.   But when cmd is built in, and
executing it will do no (lasting) damage to the shell execution environment,
then there's no need to fork.   Since neither printf nor echo affect the
execution environment at all, they're perfect cases for that kind of
optimisation (this is also a frequent idiom, so can have real benefits.)

  | What is the technique you are referring to?

Exactly the above, if cmd is built in, its output goes into memory,
more or less what would happen (inside the shell, where all this is
happening) just as if its output were read from a pipe for a non-builtin,
but with no pipe (or other I/O) involved. Then that data is simply assigned.

The same technique works for stdin, in a case like cmd1 | cmd2
where both are builtin - cmd1 writes into a memory buffer, and cmd2
reads from that same thing (this needs care as the shell needs to
handle any scheduelling that's required, running cmd1 until it
ends or the buffer fills, then cmd2 until it has consumed all
available, then nack to cmd1 again...)   Whether this is worth the
effort is questionable.   The same can be done for here docs (or
strings) being read by built in commands, which was the actual case
I had in mind in the previous message.

Of course, there are often also easier techniques - a lot of the
examples being tossed around have easier (if perhaps more verbose)
ways to be written.   If you want to assign some known data (such
that you could put it in a here doc/string - which includes values
of variables, etc, of course) then rather than

        read a b c <<< 'word1 word2 word3'

which is admittedly very compact, and looks cute, you can just do

        a=word1 b=word2 c=word3

and the same when you're filling in an array, you just
need to explicitly add the subscripts.

I suspect that some of this is because bash's "readarray" is
slightly different than "read" or a simple assignment (this is
a guess based entirely upon bits and pieces I have picked up
from this list) - which is an example of why adding new special
case "stuff" is not a great idea in general, if it works just the
same as the existing stuff, then it isn't needed (perhaps just as
a frill for simplicity), if it doesn't, then it tends to interact
badly with everything else.


The point is that this kind of thing can be done just using optimisation
techniques of the current syntax - and a script that uses it will
work anywhere (just perhaps not as fast) - inventing new stuff to
try and make things work better is rarely a good idea, it just makes
the whole system a gigantic mess of ad-hoc special cases.

Of course, if there's a problem with the way that $( ) is defined
to work (like the trailing \n stripping, or whatever) that can be
addressed, either by some new syntax "this is just like that, except
that in the new one ...." or by some shell options that modify the
way that things work, which can be set by a script that knows what
it is doing (perhaps set inside the cmd substitution itself, so it
only affects that one, or outside to affect all of them.)   Of course,
any of that loses portability.

  | But when data gets passed between commands, it would be great if memory
  | could be used for that, for various good reasons. :-)

That's what a pipe is.   In general there needs to be some mechanism
that is general enough that any random command works with it, and that
means having the kernel involved to manage access to the data safely.

A file in a memory backed filesystem (a tmpfs or whatever) isn't that
much different.

If you have a very specialised set of commands that want to communicate
with each other, then you can write them to use shared memory, and have
them communicate that way - but there is little chance that all the
standard commands (or even any of them) are suddenly going to be modified
to make that work for general use.   And there is certainly no way to
make that happen by some magic changes to the shell.

kre




reply via email to

[Prev in Thread] Current Thread [Next in Thread]