coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] env: support encoding of args into command.


From: Kaz Kylheku
Subject: [PATCH] env: support encoding of args into command.
Date: Thu, 25 May 2017 11:15:27 -0700
User-agent: telnet to port 25, man!

This is a new feature which allows the command argument of env to encode
multiple extra arguments, as well as the relocation of the first
trailing argument among those arguments.

* src/env.c (usage): Mention the existence of the feature.
(expand_command_notation): New function.
(main): Detect whether the notation is present, based on the first
character of command. If so, filter the trailing part of the argument
vector through the expand_command_notation function, and use that.
Either way, the effective vector is referenced using the down_argv
variable and that is used for the execvp call.
If an error occurs, the diagnostic refers to the first element of
down_argv rather than the original argv.

* tests/misc/env.sh: Added some test cases. Doesn't probe all the corner
cases. I solemnly declare that I manually tested those corner cases,
like "env :" and "env :{}" and such, and used valgrind for
all the manual testing to be confident that there are no
overruns or uses of uninitialized bytes.

* doc/coreutils.texi: Documented feature. Added discussion about how
env is often used for the hash bang mechanism, and how the feature
relates to this use.
---
 doc/coreutils.texi | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/env.c          | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 tests/misc/env.sh  | 18 +++++++++++++++
 3 files changed, 143 insertions(+), 2 deletions(-)

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 1834e92..9e1cb0c 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -16879,6 +16879,69 @@ env -u EDITOR PATH=/energy -- e=mc2 bar baz
 
 @end itemize
 
+Note that the ability to run commands in a modified environment is built into
+the shell language, using a very similar @samp{@var{variable}=@var{value}}
+syntax; moreover, that syntax allows commands internal to the shell to be run
+in a modified environment, which is not possible with the external
+@command{env}.  Other scripting languages usually also have their own built-in
+mechanisms for manipulating the environment around the execution of a child
+program.  Therefore the external @command{env} executable is rarely needed for
+the purpose of running a command in a modified environment.  Because the
+@command{env} utility uses @env{PATH} to search for @var{command}, it has come
+to be mainly used as a mechanism in "hash bang" scripting. In this usage,
+scripts are written using the incantation @samp{#!/usr/bin/env interp} where
+@var{interp} is the name of some scripting language interpreter. The
+@command{env} utility provides value by searching @env{PATH} for the location
+of the interpreter executable. This allows the interpreter to be installed in
+some chosen location, without that location having to be edited into the hash
+bang scripts which refer to that interpreter.
+
+On some operating systems, the following issue exists: the hash bang
+interpreter mechanism allows only one argument. Therefore, if the @command{env}
+incantation @samp{#!/usr/bin/env interp} is used, it is not possible to pass an
+argument to @samp{interp}, which is a crippling limitation in some
+circumstances requiring clumsy workarounds. To overcome this difficulty, the
+GNU Coreutils version of @command{env} supports a special notation:
+arguments for @var{command} can be embedded in the @var{command} argument
+itself as follows.  If @var{command} begins with the @samp{:} (colon)
+character, then that colon character is removed. The remainder of the
+argument is treated as record of colon-separated fields, and split
+accordingly. For instance if @var{command} is @samp{:foo:--bar:42}, then
+it is split into the fields @samp{foo}, @samp{--bar} and @samp{42}. The
+effective command is then just @samp{foo}. The other two fields will be
+passed as the first two arguments to @samp{foo}, inserted before the
+remaining @var{args}, if @samp{foo} is successfully found using
+@env{PATH} and executed.
+Furthermore, this special supports one more refinement.
+If, after colon splitting, one or more of the fields are
+equal to the character string @samp{@{@}} (open brace, closed brace)
+then the leftmost such field is replaced with the first of the @var{args}
+which follow @var{command}. In this case, that argument is removed from
+@var{args}. If @var{args} is empty, then the field is not replaced.
+
+Example: @command{env} hash bang line for a script executed by the
+fictitious @samp{intercal} interpreter. The @samp{--strict-iso} option
+is passed to the interpreter, and the @samp{--verbose} option is
+passed to the script:
+
+@example
+#!/usr/bin/env :intercal:--strict-iso:@{@}:--verbose
+... script goes here ...
+@end example
+
+When the above hash bang script is invoked with the arguments @samp{alpha} and
+@samp{omega}, @command{env} is invoked with four arguments arguments: the
+argument @samp{:intercal:--strict-iso:@{@}:--verbose}, followed by the
+path name to the above script itself, followed by @samp{alpha} and 
@samp{omega}.
+The @command{env} will parse the special notation in the command, producing
+the fields @samp{intercal}, @samp{--strict-iso}, @samp{@{@}} and
+@samp{--verbose}. The @samp{@{@}} field is recognized and replaced with
+the first of the remaining arguments, which is the path to the interpreter.
+This argument is then removed form the remaining arguments. Then
+@command{env} searches @env{PATH} for @samp{intercal}. Upon finding it,
+it executes the interpreter with the arguments @samp{--strict-iso},
+the name of the script, @samp{--verbose}, @samp{alpha} and @samp{omega}.
+
 
 The program accepts the following options.  Also see @ref{Common options}.
 Options must precede operands.
diff --git a/src/env.c b/src/env.c
index 63d5c2c..20fafdd 100644
--- a/src/env.c
+++ b/src/env.c
@@ -18,6 +18,7 @@
 
 #include <config.h>
 #include <stdio.h>
+#include <assert.h>
 #include <sys/types.h>
 #include <getopt.h>
 
@@ -70,11 +71,65 @@ Set each NAME to VALUE in the environment and run 
COMMAND.\n\
 \n\
 A mere - implies -i.  If no COMMAND, print the resulting environment.\n\
 "), stdout);
+      fputs (_("\
+\n\
+COMMAND supports a notation for encoding a command name plus one or more\n\
+arguments. This is useful when env is used in #! (hash bang) scripting.\n\
+Please see the Info documentation for the details.\n\
+"), stdout);
       emit_ancillary_info (PROGRAM_NAME);
     }
   exit (status);
 }
 
+char **
+expand_command_notation(char **argv)
+{
+  char *command = xstrdup(argv[0] + 1), *p, **pp, **nargv;
+  int nf, argc, a, rest = 1;
+
+  for (nf = 1, p = command; *p; p++)
+    {
+      if (*p == ':')
+        nf++;
+    }
+
+  for (argc = 0, pp = argv; *pp; argc++, pp++)
+    ; /* empty */
+
+  argc += nf - 1;
+
+  if ((nargv = malloc((argc + 1) * sizeof *nargv)) == NULL || command == NULL)
+    die (EXIT_FAILURE, errno, _("out of memory"));
+
+  for (a = 0, p = command; ; p++)
+    {
+      char *arg = p;
+      char *end = p + strcspn(p, ":");
+      char ch = *end;
+
+      *end = 0;
+
+      if (rest < 2 && strcmp(arg, "{}") == 0 && argv[rest])
+        arg = argv[rest++];
+
+      nargv[a++] = arg;
+
+      if (ch == ':')
+        {
+          p = end;
+          continue;
+        }
+
+      break;
+    }
+
+  assert (a == nf);
+
+  memcpy(&nargv[a], &argv[rest], sizeof nargv[0] * (argc + 2 - rest - a));
+  return nargv;
+}
+
 int
 main (int argc, char **argv)
 {
@@ -154,9 +209,14 @@ main (int argc, char **argv)
       usage (EXIT_CANCELED);
     }
 
-  execvp (argv[optind], &argv[optind]);
+  char **rest_argv = argv + optind;
+  char **down_argv = (rest_argv[0][0] == ':')
+                     ? expand_command_notation(rest_argv)
+                     : rest_argv;
+
+  execvp (down_argv[0], down_argv);
 
   int exit_status = errno == ENOENT ? EXIT_ENOENT : EXIT_CANNOT_INVOKE;
-  error (0, errno, "%s", quote (argv[optind]));
+  error (0, errno, "%s", quote (down_argv[0]));
   return exit_status;
 }
diff --git a/tests/misc/env.sh b/tests/misc/env.sh
index f2f6ba8..aeb2b91 100755
--- a/tests/misc/env.sh
+++ b/tests/misc/env.sh
@@ -150,4 +150,22 @@ test "x$(sh -c '\c=d echo fail')" = xpass && #dash 0.5.4 
fails so check first
 returns_ 125 env -u a=b true || fail=1
 returns_ 125 env -u '' true || fail=1
 
+# test the special env :command... notation for encoding arguments
+test "$(env :echo)" = "" || fail=1
+test "$(env :echo:)" = "" || fail=1
+test "$(env :echo:a)" = "a" || fail=1
+test "$(env :echo:a:b)" = "a b" || fail=1
+test "$(env :echo:a b)" = "a b" || fail=1
+test "$(env :echo:a: b)" = "a  b" || fail=1
+test "$(env :echo:a:b c d)" = "a b c d" || fail=1
+test "$(env :echo:aa:bb cc dd)" = "aa bb cc dd" || fail=1
+test "$(env :echo:{}:bb cc dd)" = "cc bb dd" || fail=1
+test "$(env :echo:{} cc dd)" = "cc dd" || fail=1
+test "$(env :echo:{}:aa:bb cc dd)" = "cc aa bb dd" || fail=1
+test "$(env :echo:{})" = "{}" || fail=1
+test "$(env :echo:{}:{})" = "{} {}" || fail=1
+test "$(env :echo:{}:a)" = "{} a" || fail=1
+test "$(env :echo:{}:a b)" = "b a" || fail=1
+test "$(env :echo:{}:{} b)" = "b {}" || fail=1
+
 Exit $fail
-- 
2.9.3



reply via email to

[Prev in Thread] Current Thread [Next in Thread]