bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

some POSIX-conformance cleanups for GNU tr


From: Paul Eggert
Subject: some POSIX-conformance cleanups for GNU tr
Date: Tue, 01 Jun 2004 15:46:43 -0700
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)

I went through the POSIX spec for 'tr' with a fine-toothed come and
compared it to GNU tr's source.  In some cases GNU 'tr' is too picky;
it diagnoses constructs for which POSIX does not require a diagnostic.
I think it's better for POSIXLY_CORRECT to affect the behavior of GNU
'tr' as little as possible, so I removed the overly-picky diagnostics.
In a few other cases GNU tr gives the wrong answer, e.g. "tr 'a\055b'
def" is treated like "tr a-b def" which isn't right.  Also, POSIX
requires an option -C which GNU tr currently doesn't support.  Here is
a patch.

2004-06-01  Paul Eggert  <address@hidden>

        Some POSIX-conformance cleanups for tr.

        * doc/coreutils.texi (tr invocation): Mention -C.
        * src/tr.c (posix_pedantic): Remove; no longer needed since
        we need to test this in just one place now.
        (usage): Mention -C.
        (unquote): Note that \055, \n, etc are escaped.
        Do not worry about POSIXLY_CORRECT when warning about ambiguous
        escape sequences.
        \ at end of string stands for itself.
        Do not diagnose invalid backslash escapes: POSIX says the behavior
        is unspecified in this case, so we don't need to diagnose it.
        (main): Add support for -C (currently an alias for -c).
        Do not diagnose 'tr [:upper:] [:upper:], as POSIX does not require
        a diagnostic here.
        * tests/tr/Test.pm: New tests bs-055, bs-at-end, repeat-Compl.
        Fix comment for range-a-a.

Index: doc/coreutils.texi
===================================================================
RCS file: /home/meyering/coreutils/cu/doc/coreutils.texi,v
retrieving revision 1.183
diff -p -u -r1.183 coreutils.texi
--- doc/coreutils.texi  1 Jun 2004 12:46:22 -0000       1.183
+++ doc/coreutils.texi  1 Jun 2004 21:41:51 -0000
@@ -4670,8 +4670,17 @@ delete characters, then squeeze repeated
 The @var{set1} and (if given) @var{set2} arguments define ordered
 sets of characters, referred to below as @var{set1} and @var{set2}.  These
 sets are the characters of the input that @command{tr} operates on.
-The @option{--complement} (@option{-c}) option replaces @var{set1} with its
+The @option{--complement} (@option{-c}, @option{-C}) option replaces
address@hidden with its
 complement (all of the characters that are not in @var{set1}).
+
+Currently @command{tr} fully supports only single-byte characters.
+Eventually it will support multibyte characters; when it does, the
address@hidden option will cause it to complement the set of characters,
+whereas @option{-c} will cause it to complement the set of values.
+This distinction will matter only when some values are not characters,
+and this is possible only in locales using multibyte encodings when
+the input contains encoding errors.
 
 @exitstatus
 
Index: src/tr.c
===================================================================
RCS file: /home/meyering/coreutils/cu/src/tr.c,v
retrieving revision 1.131
diff -p -u -r1.131 tr.c
--- src/tr.c    31 May 2004 11:30:27 -0000      1.131
+++ src/tr.c    1 Jun 2004 22:31:20 -0000
@@ -212,23 +212,8 @@ static bool delete = false;
 /* Use the complement of set1 in place of set1.  */
 static bool complement = false;
 
-/* When nonzero, this flag causes GNU tr to provide strict
-   compliance with POSIX draft 1003.2.11.2.  The POSIX spec
-   says that when -d is used without -s, string2 (if present)
-   must be ignored.  Silently ignoring arguments is a bad idea.
-   The default GNU behavior is to give a usage message and exit.
-   Additionally, when this flag is nonzero, tr prints warnings
-   on stderr if it is being used in a manner that is not portable.
-   Applicable warnings are given by default, but are suppressed
-   if the environment variable `POSIXLY_CORRECT' is set, since
-   being POSIX conformant means we can't issue such messages.
-   Warnings on the following topics are suppressed when this
-   variable is nonzero:
-   1. Ambiguous octal escapes.  */
-static bool posix_pedantic;
-
 /* When tr is performing translation and string1 is longer than string2,
-   POSIX says that the result is undefined.  That gives the implementor
+   POSIX says that the result is unspecified.  That gives the implementor
    of a POSIX conforming version of tr two reasonable choices for the
    semantics of this case.
 
@@ -314,7 +299,7 @@ Usage: %s [OPTION]... SET1 [SET2]\n\
 Translate, squeeze, and/or delete characters from standard input,\n\
 writing to standard output.\n\
 \n\
-  -c, --complement        first complement SET1\n\
+  -c, -C, --complement    first complement SET1\n\
   -d, --delete            delete characters in SET1, do not translate\n\
   -s, --squeeze-repeats   replace each input sequence of a repeated 
character\n\
                             that is listed in SET1 with a single occurrence\n\
@@ -475,6 +460,7 @@ unquote (char const *s, struct E_string 
       switch (s[i])
        {
        case '\\':
+         es->escaped[j] = true;
          switch (s[i + 1])
            {
            case '\\':
@@ -523,15 +509,16 @@ unquote (char const *s, struct E_string 
                          c = 8 * c + oct_digit;
                          ++i;
                        }
-                     else if (!posix_pedantic)
+                     else
                        {
                          /* A 3-digit octal number larger than \377 won't
                             fit in 8 bits.  So we stop when adding the
                             next digit would put us over the limit and
                             give a warning about the ambiguity.  POSIX
-                            isn't clear on this, but one person has said
-                            that in his interpretation, POSIX says tr
-                            can't even give a warning.  */
+                            isn't clear on this, and we interpret this
+                            lack of clarity as meaning the resulting behavior
+                            is undefined, which means we're allowed to issue
+                            a warning.  */
                          error (0, 0, _("warning: the ambiguous octal escape \
 \\%c%c%c is being\n\tinterpreted as the 2-byte sequence \\0%c%c, `%c'"),
                                 s[i], s[i + 1], s[i + 2],
@@ -541,20 +528,15 @@ unquote (char const *s, struct E_string 
                }
              break;
            case '\0':
-             error (0, 0, _("invalid backslash escape at end of string"));
-             return false;
-
+             /* POSIX seems to require that a trailing backslash must
+                stand for itself.  Weird.  */
+             es->escaped[j] = false;
+             i--;
+             c = '\\';
+             break;
            default:
-             if (posix_pedantic)
-               {
-                 error (0, 0, _("invalid backslash escape `\\%c'"), s[i + 1]);
-                 return false;
-               }
-             else
-               {
-                 c = s[i + 1];
-                 es->escaped[j] = true;
-               }
+             c = s[i + 1];
+             break;
            }
          ++i;
          es->s[j++] = c;
@@ -1701,7 +1683,7 @@ main (int argc, char **argv)
 
   atexit (close_stdout);
 
-  while ((c = getopt_long (argc, argv, "cdst", long_options, NULL)) != -1)
+  while ((c = getopt_long (argc, argv, "cCdst", long_options, NULL)) != -1)
     {
       switch (c)
        {
@@ -1709,6 +1691,7 @@ main (int argc, char **argv)
          break;
 
        case 'c':
+       case 'C':
          complement = true;
          break;
 
@@ -1734,8 +1717,6 @@ main (int argc, char **argv)
        }
     }
 
-  posix_pedantic = (getenv ("POSIXLY_CORRECT") != NULL);
-
   non_option_args = argc - optind;
   translating = (non_option_args == 2 && !delete);
 
@@ -1764,7 +1745,7 @@ deleting and squeezing repeats"));
      this deserves a fatal error, so that's the default.  */
   if ((delete && !squeeze_repeats) && non_option_args != 1)
     {
-      if (posix_pedantic && non_option_args == 2)
+      if (non_option_args == 2 && getenv ("POSIXLY_CORRECT"))
        --non_option_args;
       else
        error (EXIT_FAILURE, 0,
@@ -1888,17 +1869,8 @@ without squeezing repeats"));
              else if ((class_s1 == UL_LOWER && class_s2 == UL_LOWER)
                       || (class_s1 == UL_UPPER && class_s2 == UL_UPPER))
                {
-                 /* By default, GNU tr permits the identity mappings: from
-                    [:upper:] to [:upper:] and [:lower:] to [:lower:].  But
-                    when POSIXLY_CORRECT is set, those evoke diagnostics.  */
-                 if (posix_pedantic)
-                   {
-                     error (EXIT_FAILURE, 0,
-                            _("\
-invalid identity mapping;  when translating, any [:lower:] or [:upper:]\n\
-construct in string1 must be aligned with a corresponding construct\n\
-([:upper:] or [:lower:], respectively) in string2"));
-                   }
+                 /* POSIX says the behavior of `tr "[:upper:]" "[:upper:]"'
+                    is undefined.  Treat it as a no-op.  */
                }
              else
                {
Index: tests/tr/Test.pm
===================================================================
RCS file: /home/meyering/coreutils/cu/tests/tr/Test.pm,v
retrieving revision 1.10
diff -p -u -r1.10 Test.pm
--- tests/tr/Test.pm    31 May 2004 12:17:49 -0000      1.10
+++ tests/tr/Test.pm    1 Jun 2004 22:37:05 -0000
@@ -68,7 +68,7 @@ my @tv = (
 ['y', '-d ' . q|'a-z'|, 'abc $code', ' $', 0],
 ['z', '-ds ' . q|'a-z' '$.'|, 'a.b.c $$$$code\\', '. $\\', 0],
 
-# Make sure that a-a is accepted, even though POSIX 1001.2 says it is illegal.
+# Make sure that a-a is accepted.
 ['range-a-a', q|'a-a' 'z'|,         'abc',    'zbc',               0],
 #
 ['null', q|'a' ''''|,          '',       '',                  1],
@@ -84,6 +84,8 @@ my @tv = (
 ['o-rep-2',   q|'[b*010]cd' '[a*7]BC[x*]'|, 'bcd', 'BCx', 0],
 
 ['esc',     q|'a\-z' 'A-Z'|,           'abc-z', 'AbcBC', 0],
+['bs-055', q|'a\055b' def|,            "a\055b", 'def', 0],
+['bs-at-end', q|'\' x|,                        "\\", 'x', 0],
 
 #
 # From Ross
@@ -108,6 +110,7 @@ my @tv = (
 ['repeat-0',             q|abc '[b*0]'|, 'abcd', 'bbbd', 0],
 ['repeat-000',           q|abc '[b*00000000000000000000]'|, 'abcd', 'bbbd', 0],
 ['repeat-compl', '-c ' . q|'[a*65536]\n' '[b*]'|, 'abcd', 'abbb', 0],
+['repeat-Compl', '-C ' . q|'[a*65536]\n' '[b*]'|, 'abcd', 'abbb', 0],
 
 );
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]