bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: echo does not recognize backslash escapes in POSIX mode


From: Paul Eggert
Subject: Re: echo does not recognize backslash escapes in POSIX mode
Date: Thu, 13 May 2004 01:21:57 -0700
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)

Matthew Fischer <address@hidden> writes:

> When POSIXLY_CORRECT is set in the environment, echo does not recognize
> the backslash escapes specified in SUSv2:

Backslash escapes are not required by POSIX 1003.1-2004; see
<http://www.opengroup.org/onlinepubs/009695399/utilities/echo.html>.
They are listed as an XSI extension.  I think coreutils tries to
support XSI when that wouldn't cause too much trouble, but it would
cause trouble in this case since many people expect echo to output its
arguments unmodified.

That being said, I noticed several POSIX- and bash-compatibility
problems in coreutils echo.  Here is a proposed patch.

2004-05-13  Paul Eggert  <address@hidden>

        * NEWS: echo compatibility cleanup.
        * doc/coreutils.texi (echo invocation): Document the changes.
        * src/echo.c (V9_ECHO): Remove; always enabled.
        (DEFAULT_ECHO_TO_XPG): Renamed from V9_DEFAULT, so that
        we use the same naming convention as bash.  Now an enum,
        not a macro.
        (usage): Reword to mention -e/-E more accurately.
        Mention \0NNN (the POSIX syntax) rather than \NNN (nonstandard).
        (hextobin): New function.
        (main): Use bool rather than int for local vars when appropriate.
        Do not allow options if POSIXLY_CORRECT, unless we are using
        BSD semantics and the first argument is "-n".
        Don't pass unnecessary extra arg to parse_long_options.
        do_v9 now defaults to DEFAULT_ECHO_TO_XPG, not to allow_options.
        Do not look for options if !allow_options.
        Use size_t rather than int when appropriate.
        Open-code option test rather than using strrchr.
        Use faster test for "-".
        Avoid redundant argc test.
        Add support for \x, for Bash compatibility.
        Use e.g. '\a' rather than '\007', for portability to EBCDIC hosts.
        When '\c' is encountered, stop printing immediately, as POSIX
        requires.
        Add support for \xhh syntax.
        Add support for \0ooo syntax; POSIX requires this.

Index: NEWS
===================================================================
RCS file: /home/meyering/coreutils/cu/NEWS,v
retrieving revision 1.205
diff -p -u -r1.205 NEWS
--- NEWS        8 May 2004 22:24:25 -0000       1.205
+++ NEWS        13 May 2004 08:15:04 -0000
@@ -16,6 +16,11 @@ GNU coreutils NEWS                      
 
   ls no longer segfaults on systems for which SIZE_MAX != (size_t) -1
 
+  echo now conforms to POSIX better.  It supports the \0ooo syntax for
+  octal escapes, and \c now terminates printing immediately.  If
+  POSIXLY_CORRECT is set and the first argument is not "-n", echo now
+  outputs all option-like arguments instead of treating them as options.
+
 ** New features
 
   pwd now works even when run from a working directory whose name
@@ -68,6 +73,9 @@ GNU coreutils NEWS                      
 
   `date' has a new option --iso-8601=ns that outputs
   nanosecond-resolution time stamps.
+
+  echo -e '\xHH' now outputs a byte whose hexadecimal value is HH,
+  for compatibility with bash.
 
 
 * Major changes in release 5.2.1 (2004-03-12) [stable]
Index: doc/coreutils.texi
===================================================================
RCS file: /home/meyering/coreutils/cu/doc/coreutils.texi,v
retrieving revision 1.180
diff -p -u -r1.180 coreutils.texi
--- doc/coreutils.texi  9 May 2004 19:42:19 -0000       1.180
+++ doc/coreutils.texi  13 May 2004 07:04:24 -0000
@@ -7787,6 +7787,7 @@ On some systems @code{unlink} can be use
 directory.  On others, it can be used that way only by a privileged user.
 In the GNU system @code{unlink} can never delete the name of a directory.
 
address@hidden POSIXLY_CORRECT
 By default, @command{unlink} honors the @option{--help} and @option{--version}
 options.  That makes it a little harder to remove files named
 @option{--help} and @option{--version}, so when the environment variable
@@ -8942,13 +8943,40 @@ horizontal tab
 vertical tab
 @item \\
 backslash
address@hidden address@hidden
+the eight-bit value that is the octal number @var{nnn}
+(zero to three octal digits)
 @item address@hidden
-the character whose @acronym{ASCII} code is @var{nnn} (octal); if @var{nnn} is 
not
-a valid octal number, it is printed literally.
+the eight-bit value that is the octal number @var{nnn}
+(one to three octal digits)
address@hidden address@hidden
+the eight-bit value that is the hexadecimal number @var{hh}
+(one or two hexadecimal digits)
 @end table
 
address@hidden -E
address@hidden -E
address@hidden backslash escapes
+Disable interpretation of backslash escapes in each @var{string}.
+This is the default.  If @option{-e} and @option{-E} are both
+specified, the last one given takes effect.
+
 @end table
 
address@hidden POSIXLY_CORRECT
+If the @env{POSIXLY_CORRECT} environment variable is set, then when
address@hidden's first argument is not @option{-n} it outputs
+option-like arguments instead of treating them as options.  For
+example, @code{echo -ne hello} outputs @samp{-ne hello} instead of
+plain @samp{hello}.
+
address@hidden does not require support for any options, and says
+that the behavior of @command{echo} is implementation-defined if any
address@hidden contains a backslash or if the first argument is
address@hidden  Portable programs can use the @command{printf} command
+if they need to omit trailing newlines or output control characters or
+backslashes.  @xref{printf invocation}.
+
 @exitstatus
 
 
@@ -9097,6 +9125,7 @@ pipeline.
 @dfn{failure}.  It can be used as a place holder in shell scripts
 where an unsuccessful command is needed.
 
address@hidden POSIXLY_CORRECT
 By default, @command{false} honors the @option{--help} and @option{--version}
 options.  However, that is contrary to @acronym{POSIX}, so when the environment
 variable @env{POSIXLY_CORRECT} is set, @command{false} ignores @emph{all}
@@ -9128,6 +9157,7 @@ In most modern shells, @command{true} is
 you use @samp{true} in a script, you're probably using the built-in
 command, not the one documented here.
 
address@hidden POSIXLY_CORRECT
 By default, @command{true} honors the @option{--help} and @option{--version}
 options.  However, that is contrary to @acronym{POSIX}, so when the environment
 variable @env{POSIXLY_CORRECT} is set, @command{true} ignores @emph{all}
Index: src/echo.c
===================================================================
RCS file: /home/meyering/coreutils/cu/src/echo.c,v
retrieving revision 1.54
diff -p -u -r1.54 echo.c
--- src/echo.c  21 Jan 2004 22:57:19 -0000      1.54
+++ src/echo.c  13 May 2004 08:19:12 -0000
@@ -39,28 +39,16 @@ following backslash-escaped characters i
        \t      horizontal tab
        \v      vertical tab
        \\      backslash
-       \num    the character whose ASCII code is NUM (octal).
+       \0NNN   the character whose ASCII code is NNN (octal).
 
 You can explicitly turn off the interpretation of the above characters
 on System V systems with the -E option.
 */
 
-/* If defined, interpret backslash escapes if -e is given.  */
-#define V9_ECHO
-
-/* If defined, interpret backslash escapes unless -E is given.
-   V9_ECHO must also be defined.  */
-/* #define V9_DEFAULT */
-
-#if defined (V9_ECHO)
-# if defined (V9_DEFAULT)
-#  define VALID_ECHO_OPTIONS "neE"
-# else
-#  define VALID_ECHO_OPTIONS "ne"
-# endif /* !V9_DEFAULT */
-#else /* !V9_ECHO */
-# define VALID_ECHO_OPTIONS "n"
-#endif /* !V9_ECHO */
+/* If true, interpret backslash escapes by default.  */
+#ifndef DEFAULT_ECHO_TO_XPG
+enum { DEFAULT_ECHO_TO_XPG = false };
+#endif
 
 /* The name this program was run with. */
 char *program_name;
@@ -86,9 +74,9 @@ Echo the STRING(s) to standard output.\n
       fputs (VERSION_OPTION_DESCRIPTION, stdout);
       fputs (_("\
 \n\
-Without -E, the following sequences are recognized and interpolated:\n\
+If -e is in effect, the following sequences are recognized:\n\
 \n\
-  \\NNN   the character whose ASCII code is NNN (octal)\n\
+  \\0NNN   the character whose ASCII code is NNN (octal)\n\
   \\\\     backslash\n\
   \\a     alert (BEL)\n\
   \\b     backspace\n\
@@ -106,6 +94,22 @@ Without -E, the following sequences are 
   exit (status);
 }
 
+/* Convert C from hexadecimal character to integer.  */
+static int
+hextobin (unsigned char c)
+{
+  switch (c)
+    {
+    default: return c - '0';
+    case 'a': case 'A': return 10;
+    case 'b': case 'B': return 11;
+    case 'c': case 'C': return 12;
+    case 'd': case 'D': return 13;
+    case 'e': case 'E': return 14;
+    case 'f': case 'F': return 15;
+    }
+}
+
 /* Print the words in LIST to standard output.  If the first word is
    `-n', then don't print a trailing newline.  We also support the
    echo syntax from Version 9 unix systems. */
@@ -113,8 +117,15 @@ Without -E, the following sequences are 
 int
 main (int argc, char **argv)
 {
-  int display_return = 1, do_v9 = 0;
-  int allow_options = 1;
+  bool display_return = true;
+  bool allow_options =
+    (! getenv ("POSIXLY_CORRECT")
+     || (! DEFAULT_ECHO_TO_XPG && 0 < argc && strcmp (argv[1], "-n") == 0));
+
+  /* System V machines already have a /bin/sh with a v9 behavior.
+     Use the identical behavior for these machines so that the
+     existing system shell scripts won't barf.  */
+  bool do_v9 = DEFAULT_ECHO_TO_XPG;
 
   initialize_main (&argc, &argv);
   program_name = argv[0];
@@ -124,124 +135,135 @@ main (int argc, char **argv)
 
   atexit (close_stdout);
 
-  /* Don't recognize --help or --version if POSIXLY_CORRECT is set.  */
-  if (getenv ("POSIXLY_CORRECT") == NULL)
+  if (allow_options)
     parse_long_options (argc, argv, PROGRAM_NAME, GNU_PACKAGE, VERSION,
-                     usage, AUTHORS, (char const *) NULL, NULL);
-  else
-    allow_options = 0;
-
-/* System V machines already have a /bin/sh with a v9 behaviour.  We
-   use the identical behaviour for these machines so that the
-   existing system shell scripts won't barf. */
-#if defined (V9_ECHO) && defined (V9_DEFAULT)
-  do_v9 = allow_options;
-#endif
+                       usage, AUTHORS, (char const *) NULL);
 
   --argc;
   ++argv;
 
-  while (argc > 0 && *argv[0] == '-')
-    {
-      register char *temp;
-      register int i;
-
-      /* If it appears that we are handling options, then make sure that
-        all of the options specified are actually valid.  Otherwise, the
-        string should just be echoed. */
-      temp = argv[0] + 1;
+  if (allow_options)
+    while (argc > 0 && *argv[0] == '-')
+      {
+       char const *temp = argv[0] + 1;
+       size_t i;
+
+       /* If it appears that we are handling options, then make sure that
+          all of the options specified are actually valid.  Otherwise, the
+          string should just be echoed.  */
 
-      for (i = 0; temp[i]; i++)
-       {
-         if (strrchr (VALID_ECHO_OPTIONS, temp[i]) == 0)
-           goto just_echo;
-       }
+       for (i = 0; temp[i]; i++)
+         switch (temp[i])
+           {
+           case 'e': case 'E': case 'n':
+             break;
+           default:
+             goto just_echo;
+           }
 
-      if (!*temp)
-       goto just_echo;
+       if (i == 0)
+         goto just_echo;
 
-      /* All of the options in TEMP are valid options to ECHO.
-        Handle them. */
-      while (*temp)
-       {
-         if (allow_options && *temp == 'n')
-           display_return = 0;
-#if defined (V9_ECHO)
-         else if (allow_options && *temp == 'e')
-           do_v9 = 1;
-# if defined (V9_DEFAULT)
-         else if (allow_options && *temp == 'E')
-           do_v9 = 0;
-# endif /* V9_DEFAULT */
-#endif /* V9_ECHO */
-         else
-           goto just_echo;
+       /* All of the options in TEMP are valid options to ECHO.
+          Handle them. */
+       while (*temp)
+         switch (*temp++)
+           {
+           case 'e':
+             do_v9 = true;
+             break;
+
+           case 'E':
+             do_v9 = false;
+             break;
+
+           case 'n':
+             display_return = false;
+             break;
+           }
 
-         temp++;
-       }
-      argc--;
-      argv++;
-    }
+       argc--;
+       argv++;
+      }
 
 just_echo:
 
-  if (argc > 0)
+  if (do_v9)
     {
-#if defined (V9_ECHO)
-      if (do_v9)
+      while (argc > 0)
        {
-         while (argc > 0)
-           {
-             register char *s = argv[0];
-             register int c;
+         char const *s = argv[0];
+         unsigned char c;
 
-             while ((c = *s++))
+         while ((c = *s++))
+           {
+             if (c == '\\' && *s)
                {
-                 if (c == '\\' && *s)
+                 switch (c = *s++)
                    {
-                     switch (c = *s++)
-                       {
-                       case 'a': c = '\007'; break;
-                       case 'b': c = '\b'; break;
-                       case 'c': display_return = 0; continue;
-                       case 'f': c = '\f'; break;
-                       case 'n': c = '\n'; break;
-                       case 'r': c = '\r'; break;
-                       case 't': c = '\t'; break;
-                       case 'v': c = (int) 0x0B; break;
-                       case '0': case '1': case '2': case '3':
-                       case '4': case '5': case '6': case '7':
-                         c -= '0';
-                         if (*s >= '0' && *s <= '7')
-                           c = c * 8 + (*s++ - '0');
-                         if (*s >= '0' && *s <= '7')
-                           c = c * 8 + (*s++ - '0');
-                         break;
-                       case '\\': break;
-                       default:  putchar ('\\'); break;
-                       }
+                   case 'a': c = '\a'; break;
+                   case 'b': c = '\b'; break;
+                   case 'c': exit (EXIT_SUCCESS);
+                   case 'f': c = '\f'; break;
+                   case 'n': c = '\n'; break;
+                   case 'r': c = '\r'; break;
+                   case 't': c = '\t'; break;
+                   case 'v': c = '\v'; break;
+                   case 'x':
+                     {
+                       unsigned char ch = *s;
+                       if (! ISXDIGIT (ch))
+                         goto not_an_escape;
+                       s++;
+                       c = hextobin (ch);
+                       ch = *s;
+                       if (ISXDIGIT (ch))
+                         {
+                           s++;
+                           c = c * 16 + hextobin (ch);
+                         }
+                     }
+                     break;
+                   case '0':
+                     c = 0;
+                     if (! ('0' <= *s && *s <= '7'))
+                       break;
+                     c = *s++;
+                     /* Fall through.  */
+                   case '1': case '2': case '3':
+                   case '4': case '5': case '6': case '7':
+                     c -= '0';
+                     if ('0' <= *s && *s <= '7')
+                       c = c * 8 + (*s++ - '0');
+                     if ('0' <= *s && *s <= '7')
+                       c = c * 8 + (*s++ - '0');
+                     break;
+                   case '\\': break;
+
+                   not_an_escape:
+                   default:  putchar ('\\'); break;
                    }
-                 putchar(c);
                }
-             argc--;
-             argv++;
-             if (argc > 0)
-               putchar(' ');
+             putchar (c);
            }
+         argc--;
+         argv++;
+         if (argc > 0)
+           putchar (' ');
        }
-      else
-#endif /* V9_ECHO */
+    }
+  else
+    {
+      while (argc > 0)
        {
-         while (argc > 0)
-           {
-             fputs (argv[0], stdout);
-             argc--;
-             argv++;
-             if (argc > 0)
-               putchar (' ');
-           }
+         fputs (argv[0], stdout);
+         argc--;
+         argv++;
+         if (argc > 0)
+           putchar (' ');
        }
     }
+
   if (display_return)
     putchar ('\n');
   exit (EXIT_SUCCESS);




reply via email to

[Prev in Thread] Current Thread [Next in Thread]