bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

unexpand, expand POSIX-conformance fixes.


From: Paul Eggert
Subject: unexpand, expand POSIX-conformance fixes.
Date: Tue, 24 Aug 2004 00:44:00 -0700

I installed these POSIX-conformance fixes for unexpand.
They're fairly subtle, but basically "unexpand" was converted
some blanks that it should have left alone.  I added test cases
for the problem areas.

2004-08-24  Paul Eggert  <address@hidden>

        POSIX-conformance fixes for "expand" and "unexpand".
        Also, consistently use "tab stop" rather than "tabstop".
        * NEWS: Document fixes.

        * src/expand.c: Revamp to resemble the new unexpand.c better.
        (usage): -i does not convert tabs after non-tabs.
        (add_tab_stop): Renamed from add_tabstop.  All uses changed.
        (parse_tab_stop): Renamed from parse_tabstop.  All uses changed.
        (validate_tab_stop): Renamed from validate_tabstop.  All uses changed.
        (next_file, main): Check fclose against 0, not EOF.
        (expand): Remove unnecessary casts.
        Add another loop nesting level, for lines, so that per-line variables
        are initialized cleanly.
        Revamp tab checking.  Check for write error immediately, rather
        than just once at the end of the program.
        * src/unexpand.c: Lkewise (for the expand.c changes).
        (TAB_STOP_SENTINEL): Remove.
        (tab_size): Now size_t, not uintmax_t, since we need to store
        the sequences of blanks.
        (max_column_width): New var.
        (usage): Say "blank" where POSIX requires this.
        (add_tab_stop): Calculate maximum column width.
        (unexpand): Store the pending blanks, instead of merely counting them.
        Follow POSIX's rules about -a requiring two blanks before a tab stop.
        Get rid of internal label and goto.

        * tests/unexpand/basic-1: Fix infloop-3 to match POSIX.
        Add blanks-1 through blanks-13.

        * doc/coreutils.texi: Standardize on "tab stop" (the POSIX usage)
        rather than "tabstop".
        (unexpand invocation): Use "blank" rather than "space" when
        POSIX requires "blank".  Define "blank".  Initial blanks are
        converted even if there's just one.  For -a, convert two or
        more blanks only if they occur just before a tab stop.

Index: NEWS
===================================================================
RCS file: /home/eggert/coreutils/cu/NEWS,v
retrieving revision 1.229
diff -p -u -r1.229 NEWS
--- NEWS        19 Aug 2004 20:02:07 -0000      1.229
+++ NEWS        24 Aug 2004 07:27:46 -0000
@@ -85,6 +85,11 @@ GNU coreutils NEWS                      
   POSIXLY_CORRECT is set and the first argument is not "-n", echo now
   outputs all option-like arguments instead of treating them as options.
 
+  expand and unexpand now conform to POSIX better.  They check for
+  blanks (which can include characters other than space and tab in
+  non-POSIX locales) instead of spaces and tabs.  Unexpand now
+  preserves some blanks instead of converting them to tabs or spaces.
+
   printf has several changes:
 
     It now uses 'intmax_t' (not 'long int') to format integers, so it
Index: src/expand.c
===================================================================
RCS file: /home/eggert/coreutils/cu/src/expand.c,v
retrieving revision 1.76
diff -p -u -r1.76 expand.c
--- src/expand.c        2 Aug 2004 23:49:31 -0000       1.76
+++ src/expand.c        24 Aug 2004 07:12:14 -0000
@@ -24,9 +24,9 @@
    --tabs=tab1[,tab2[,...]]
    -t tab1[,tab2[,...]]
    -tab1[,tab2[,...]]  If only one tab stop is given, set the tabs tab1
-                       spaces apart instead of the default 8.  Otherwise,
+                       columns apart instead of the default 8.  Otherwise,
                        set the tabs at columns tab1, tab2, etc. (numbered from
-                       0); replace any tabs beyond the tabstops given with
+                       0); replace any tabs beyond the tab stops given with
                        single spaces.
    --initial
    -i                  Only convert initial tabs on each line to spaces.
@@ -120,7 +120,7 @@ With no FILE, or when FILE is -, read st
 Mandatory arguments to long options are mandatory for short options too.\n\
 "), stdout);
       fputs (_("\
-  -i, --initial       do not convert TABs after non whitespace\n\
+  -i, --initial       do not convert tabs after non blanks\n\
   -t, --tabs=NUMBER   have tabs NUMBER characters apart, not 8\n\
 "), stdout);
       fputs (_("\
@@ -136,18 +136,18 @@ Mandatory arguments to long options are 
 /* Add tab stop TABVAL to the end of `tab_list'.  */
 
 static void
-add_tabstop (uintmax_t tabval)
+add_tab_stop (uintmax_t tabval)
 {
   if (first_free_tab == n_tabs_allocated)
     tab_list = x2nrealloc (tab_list, &n_tabs_allocated, sizeof *tab_list);
   tab_list[first_free_tab++] = tabval;
 }
 
-/* Add the comma or blank separated list of tabstops STOPS
-   to the list of tabstops.  */
+/* Add the comma or blank separated list of tab stops STOPS
+   to the list of tab stops.  */
 
 static void
-parse_tabstops (char const *stops)
+parse_tab_stops (char const *stops)
 {
   bool have_tabval = false;
   uintmax_t tabval IF_LINT (= 0);
@@ -159,7 +159,7 @@ parse_tabstops (char const *stops)
       if (*stops == ',' || ISBLANK (to_uchar (*stops)))
        {
          if (have_tabval)
-           add_tabstop (tabval);
+           add_tab_stop (tabval);
          have_tabval = false;
        }
       else if (ISDIGIT (*stops))
@@ -198,14 +198,14 @@ parse_tabstops (char const *stops)
     exit (EXIT_FAILURE);
 
   if (have_tabval)
-    add_tabstop (tabval);
+    add_tab_stop (tabval);
 }
 
-/* Check that the list of tabstops TABS, with ENTRIES entries,
+/* Check that the list of tab stops TABS, with ENTRIES entries,
    contains only nonzero, ascending values.  */
 
 static void
-validate_tabstops (uintmax_t const *tabs, size_t entries)
+validate_tab_stops (uintmax_t const *tabs, size_t entries)
 {
   uintmax_t prev_tab = 0;
   size_t i;
@@ -240,7 +240,7 @@ next_file (FILE *fp)
        }
       if (fp == stdin)
        clearerr (fp);          /* Also clear EOF.  */
-      else if (fclose (fp) == EOF)
+      else if (fclose (fp) != 0)
        {
          error (0, errno, "%s", prev_file);
          exit_status = EXIT_FAILURE;
@@ -273,14 +273,10 @@ next_file (FILE *fp)
 static void
 expand (void)
 {
-  FILE *fp;                    /* Input stream.  */
-  size_t tab_index = 0;                /* Index in `tab_list' of next tabstop. 
 */
-  uintmax_t column = 0;                /* Column of next char.  */
-  uintmax_t next_tab_column;   /* Column the next tab stop is on.  */
-  bool convert = true;         /* If true, perform translations.  */
+  /* Input stream.  */
+  FILE *fp = next_file (NULL);
 
-  fp = next_file ((FILE *) NULL);
-  if (fp == NULL)
+  if (!fp)
     return;
 
   /* Binary I/O will preserve the original EOL style (DOS/Unix) of files.  */
@@ -288,74 +284,89 @@ expand (void)
 
   for (;;)
     {
-      int c = getc (fp);
-      if (c == EOF)
-       {
-         fp = next_file (fp);
-         if (fp)
-           {
-             SET_BINARY2 (fileno (fp), STDOUT_FILENO);
-             continue;
-           }
-         break;
-       }
+      /* Input character, or EOF.  */
+      int c;
 
-      if (c == '\n')
-       {
-         putchar (c);
-         tab_index = 0;
-         column = 0;
-         convert = true;
-       }
-      else if (c == '\t' && convert)
-       {
-         if (tab_size == 0)
-           {
-             /* Do not let tab_index == first_free_tab;
-                stop when it is 1 less.  */
-             while (tab_index < first_free_tab - 1
-                    && column >= tab_list[tab_index])
-               tab_index++;
-             next_tab_column = tab_list[tab_index];
-             if (tab_index < first_free_tab - 1)
-               tab_index++;
-             if (column >= next_tab_column)
-               next_tab_column = column + 1; /* Ran out of tab stops.  */
-           }
-         else
-           {
-             next_tab_column = column + tab_size - column % tab_size;
-           }
-         if (next_tab_column < column)
-           error (EXIT_FAILURE, 0, _("input line is too long"));
-         while (column < next_tab_column)
-           {
-             putchar (' ');
-             ++column;
-           }
-       }
-      else
+      /* If true, perform translations.  */
+      bool convert = true;
+
+
+      /* The following variables have valid values only when CONVERT
+        is true:  */
+
+      /* Column of next input character.  */
+      uintmax_t column = 0;
+
+      /* Index in TAB_LIST of next tab stop to examine.  */
+      size_t tab_index = 0;
+
+
+      /* Convert a line of text.  */
+
+      do
        {
+         while ((c = getc (fp)) < 0 && (fp = next_file (fp)))
+           SET_BINARY2 (fileno (fp), STDOUT_FILENO);
+
          if (convert)
            {
-             if (c == '\b')
+             if (c == '\t')
+               {
+                 /* Column the next input tab stop is on.  */
+                 uintmax_t next_tab_column;
+
+                 if (tab_size)
+                   next_tab_column = column + (tab_size - column % tab_size);
+                 else
+                   for (;;)
+                     if (tab_index == first_free_tab)
+                       {
+                         next_tab_column = column + 1;
+                         break;
+                       }
+                     else
+                       {
+                         uintmax_t tab = tab_list[tab_index++];
+                         if (column < tab)
+                           {
+                             next_tab_column = tab;
+                             break;
+                           }
+                       }
+
+                 if (next_tab_column < column)
+                   error (EXIT_FAILURE, 0, _("input line is too long"));
+
+                 while (++column < next_tab_column)
+                   if (putchar (' ') < 0)
+                     error (EXIT_FAILURE, errno, _("write error"));
+                     
+                 c = ' ';
+               }
+             else if (c == '\b')
                {
-                 if (column > 0)
-                   {
-                     column--;
-                     tab_index -= (tab_index != 0);
-                   }
+                 /* Go back one column, and force recalculation of the
+                    next tab stop.  */
+                 column -= !!column;
+                 tab_index -= !!tab_index;
                }
              else
                {
-                 ++column;
-                 if (column == 0)
+                 column++;
+                 if (!column)
                    error (EXIT_FAILURE, 0, _("input line is too long"));
-                 convert &= convert_entire_line;
                }
+
+             convert &= convert_entire_line | ISBLANK (c);
            }
-         putchar (c);
+
+         if (c < 0)
+           return;
+
+         if (putchar (c) < 0)
+           error (EXIT_FAILURE, errno, _("write error"));
        }
+      while (c != '\n');
     }
 }
 
@@ -396,11 +407,11 @@ main (int argc, char **argv)
          convert_entire_line = false;
          break;
        case 't':
-         parse_tabstops (optarg);
+         parse_tab_stops (optarg);
          break;
        case ',':
          if (have_tabval)
-           add_tabstop (tabval);
+           add_tab_stop (tabval);
          have_tabval = false;
          obsolete_tablist = true;
          break;
@@ -425,9 +436,9 @@ main (int argc, char **argv)
     }
 
   if (have_tabval)
-    add_tabstop (tabval);
+    add_tab_stop (tabval);
 
-  validate_tabstops (tab_list, first_free_tab);
+  validate_tab_stops (tab_list, first_free_tab);
 
   if (first_free_tab == 0)
     tab_size = 8;
@@ -440,7 +451,7 @@ main (int argc, char **argv)
 
   expand ();
 
-  if (have_read_stdin && fclose (stdin) == EOF)
+  if (have_read_stdin && fclose (stdin) != 0)
     error (EXIT_FAILURE, errno, "-");
 
   exit (exit_status);
Index: src/unexpand.c
===================================================================
RCS file: /home/eggert/coreutils/cu/src/unexpand.c,v
retrieving revision 1.81
diff -p -u -r1.81 unexpand.c
--- src/unexpand.c      3 Aug 2004 23:27:20 -0000       1.81
+++ src/unexpand.c      24 Aug 2004 06:58:38 -0000
@@ -1,4 +1,4 @@
-/* unexpand - convert spaces to tabs
+/* unexpand - convert blanks to tabs
    Copyright (C) 89, 91, 1995-2004 Free Software Foundation, Inc.
 
    This program is free software; you can redistribute it and/or modify
@@ -25,12 +25,11 @@
    --tabs=tab1[,tab2[,...]]
    -t tab1[,tab2[,...]]
    -tab1[,tab2[,...]]  If only one tab stop is given, set the tabs tab1
-                       spaces apart instead of the default 8.  Otherwise,
+                       columns apart instead of the default 8.  Otherwise,
                        set the tabs at columns tab1, tab2, etc. (numbered from
-                       0); replace any tabs beyond the tabstops given with
-                       single spaces.
+                       0); preserve any blanks beyond the tab stops given.
    --all
-   -a                  Use tabs wherever they would replace 2 or more spaces,
+   -a                  Use tabs wherever they would replace 2 or more blanks,
                        not just at the beginnings of lines.
 
    David MacKenzie <address@hidden> */
@@ -55,13 +54,6 @@
    allocated for the output line.  */
 #define OUTPUT_BLOCK 256
 
-/* A sentinel value that's placed at the end of the list of tab stops.
-   This value must be a large number, but not so large that adding the
-   length of a line to it would cause the column variable to overflow.
-   FIXME: The algorithm isn't correct once the numbers get large;
-   also, no error is reported if overflow occurs.  */
-#define TAB_STOP_SENTINEL INTMAX_MAX
-
 /* The name this program was run with.  */
 char *program_name;
 
@@ -70,7 +62,10 @@ char *program_name;
 static bool convert_entire_line;
 
 /* If nonzero, the size of all tab stops.  If zero, use `tab_list' instead.  */
-static uintmax_t tab_size;
+static size_t tab_size;
+
+/* The maximum distance between tab stops.  */
+static size_t max_column_width;
 
 /* Array of the explicit column numbers of the tab stops;
    after `tab_list' is exhausted, the rest of the line is printed
@@ -129,7 +124,7 @@ Usage: %s [OPTION]... [FILE]...\n\
 "),
              program_name);
       fputs (_("\
-Convert spaces in each FILE to tabs, writing to standard output.\n\
+Convert blanks in each FILE to tabs, writing to standard output.\n\
 With no FILE, or when FILE is -, read standard input.\n\
 \n\
 "), stdout);
@@ -137,8 +132,8 @@ With no FILE, or when FILE is -, read st
 Mandatory arguments to long options are mandatory for short options too.\n\
 "), stdout);
       fputs (_("\
-  -a, --all        convert all whitespace, instead of just initial 
whitespace\n\
-      --first-only convert only leading sequences of whitespace (overrides 
-a)\n\
+  -a, --all        convert all blanks, instead of just initial blanks\n\
+      --first-only convert only leading sequences of blanks (overrides -a)\n\
   -t, --tabs=N     have tabs N characters apart instead of 8 (enables -a)\n\
   -t, --tabs=LIST  use comma separated LIST of tab positions (enables -a)\n\
 "), stdout);
@@ -152,18 +147,28 @@ Mandatory arguments to long options are 
 /* Add tab stop TABVAL to the end of `tab_list'.  */
 
 static void
-add_tabstop (uintmax_t tabval)
+add_tab_stop (uintmax_t tabval)
 {
+  uintmax_t column_width =
+    tabval - (first_free_tab ? tab_list[first_free_tab - 1] : 0);
+
   if (first_free_tab == n_tabs_allocated)
     tab_list = x2nrealloc (tab_list, &n_tabs_allocated, sizeof *tab_list);
   tab_list[first_free_tab++] = tabval;
+
+  if (max_column_width < column_width)
+    {
+      if (SIZE_MAX < column_width)
+       error (EXIT_FAILURE, 0, _("tabs are too far apart"));
+      max_column_width = column_width;
+    }
 }
 
-/* Add the comma or blank separated list of tabstops STOPS
-   to the list of tabstops.  */
+/* Add the comma or blank separated list of tab stops STOPS
+   to the list of tab stops.  */
 
 static void
-parse_tabstops (char const *stops)
+parse_tab_stops (char const *stops)
 {
   bool have_tabval = false;
   uintmax_t tabval IF_LINT (= 0);
@@ -175,7 +180,7 @@ parse_tabstops (char const *stops)
       if (*stops == ',' || ISBLANK (to_uchar (*stops)))
        {
          if (have_tabval)
-           add_tabstop (tabval);
+           add_tab_stop (tabval);
          have_tabval = false;
        }
       else if (ISDIGIT (*stops))
@@ -214,14 +219,14 @@ parse_tabstops (char const *stops)
     exit (EXIT_FAILURE);
 
   if (have_tabval)
-    add_tabstop (tabval);
+    add_tab_stop (tabval);
 }
 
-/* Check that the list of tabstops TABS, with ENTRIES entries,
+/* Check that the list of tab stops TABS, with ENTRIES entries,
    contains only nonzero, ascending values.  */
 
 static void
-validate_tabstops (uintmax_t const *tabs, size_t entries)
+validate_tab_stops (uintmax_t const *tabs, size_t entries)
 {
   uintmax_t prev_tab = 0;
   size_t i;
@@ -256,7 +261,7 @@ next_file (FILE *fp)
        }
       if (fp == stdin)
        clearerr (fp);          /* Also clear EOF.  */
-      else if (fclose (fp) == EOF)
+      else if (fclose (fp) != 0)
        {
          error (0, errno, "%s", prev_file);
          exit_status = EXIT_FAILURE;
@@ -283,147 +288,175 @@ next_file (FILE *fp)
   return NULL;
 }
 
-/* Change spaces to tabs, writing to stdout.
+/* Change blanks to tabs, writing to stdout.
    Read each file in `file_list', in order.  */
 
 static void
 unexpand (void)
 {
-  FILE *fp;                    /* Input stream.  */
-  size_t tab_index = 0;                /* Index in `tab_list' of next tabstop. 
 */
-  size_t print_tab_index = 0;  /* For printing as many tabs as possible.  */
-  uintmax_t column = 0;                /* Column of next char.  */
-  uintmax_t next_tab_column;   /* Column the next tab stop is on.  */
-  bool convert = true;         /* If true, perform translations.  */
-  uintmax_t pending = 0;       /* Pending columns of blanks.  */
-  int saved_errno IF_LINT (= 0);
+  /* Input stream.  */
+  FILE *fp = next_file (NULL);
 
-  fp = next_file ((FILE *) NULL);
-  if (fp == NULL)
+  /* The array of pending blanks.  In non-POSIX locales, blanks can
+     include characters other than spaces, so the blanks must be
+     stored, not merely counted.  */
+  char *pending_blank;
+
+  if (!fp)
     return;
 
   /* Binary I/O will preserve the original EOL style (DOS/Unix) of files.  */
   SET_BINARY2 (fileno (fp), STDOUT_FILENO);
 
+  /* The worst case is a non-blank character, then one blank, then a
+     tab stop, then MAX_COLUMN_WIDTH - 1 blanks, then a non-blank; so
+     allocate MAX_COLUMN_WIDTH bytes to store the blanks.  */
+  pending_blank = xmalloc (max_column_width);
+
   for (;;)
     {
-      int c = getc (fp);
-      if (c == EOF)
-       {
-         fp = next_file (fp);
-         if (fp)
-           {
-             SET_BINARY2 (fileno (fp), STDOUT_FILENO);
-             continue;
-           }
-         saved_errno = errno;
-       }
+      /* Input character, or EOF.  */
+      int c;
 
-      if (c == ' ' && convert && column < TAB_STOP_SENTINEL)
-       {
-         ++pending;
-         ++column;
-       }
-      else if (c == '\t' && convert)
-       {
-         if (tab_size == 0)
-           {
-             /* Do not let tab_index == first_free_tab;
-                stop when it is 1 less.  */
-             while (tab_index < first_free_tab - 1
-                    && column >= tab_list[tab_index])
-               tab_index++;
-             next_tab_column = tab_list[tab_index];
-             if (tab_index < first_free_tab - 1)
-               tab_index++;
-             if (column >= next_tab_column)
-               {
-                 convert = false;      /* Ran out of tab stops.  */
-                 goto flush_pend;
-               }
-           }
-         else
-           {
-             next_tab_column = column + tab_size - column % tab_size;
-           }
-         pending += next_tab_column - column;
-         column = next_tab_column;
-       }
-      else
+      /* If true, perform translations.  */
+      bool convert = true;
+
+
+      /* The following variables have valid values only when CONVERT
+        is true:  */
+
+      /* Column of next input character.  */
+      uintmax_t column = 0;
+
+      /* Column the next input tab stop is on.  */
+      uintmax_t next_tab_column = 0;
+
+      /* Index in TAB_LIST of next tab stop to examine.  */
+      size_t tab_index = 0;
+
+      /* If true, the first pending blank came just before a tab stop.  */
+      bool one_blank_before_tab_stop = false;
+
+      /* If true, the previous input character was a blank.  This is
+        initially true, since initial strings of blanks are treated
+        as if the line was preceded by a blank.  */
+      bool prev_blank = true;
+
+      /* Number of pending columns of blanks.  */
+      size_t pending = 0;
+
+
+      /* Convert a line of text.  */
+
+      do
        {
-       flush_pend:
-         /* Flush pending spaces.  Print as many tabs as possible,
-            then print the rest as spaces.  */
-         if (pending == 1)
-           {
-             putchar (' ');
-             pending = 0;
-           }
-         column -= pending;
-         while (pending > 0)
+         while ((c = getc (fp)) < 0 && (fp = next_file (fp)))
+           SET_BINARY2 (fileno (fp), STDOUT_FILENO);
+
+         if (convert)
            {
-             if (tab_size == 0)
-               {
-                 /* Do not let print_tab_index == first_free_tab;
-                    stop when it is 1 less.  */
-                 while (print_tab_index < first_free_tab - 1
-                        && column >= tab_list[print_tab_index])
-                   print_tab_index++;
-                 next_tab_column = tab_list[print_tab_index];
-                 if (print_tab_index < first_free_tab - 1)
-                   print_tab_index++;
-               }
-             else
-               {
-                 next_tab_column = column + tab_size - column % tab_size;
-               }
-             if (next_tab_column - column <= pending)
-               {
-                 putchar ('\t');
-                 pending -= next_tab_column - column;
-                 column = next_tab_column;
-               }
-             else
+             bool blank = ISBLANK (c);
+
+             if (blank)
                {
-                 --print_tab_index;
-                 column += pending;
-                 while (pending != 0)
+                 if (next_tab_column <= column)
                    {
-                     putchar (' ');
-                     pending--;
+                     if (tab_size)
+                       next_tab_column =
+                         column + (tab_size - column % tab_size);
+                     else
+                       for (;;)
+                         if (tab_index == first_free_tab)
+                           {
+                             convert = false;
+                             break;
+                           }
+                         else
+                           {
+                             uintmax_t tab = tab_list[tab_index++];
+                             if (column < tab)
+                               {
+                                 next_tab_column = tab;
+                                 break;
+                               }
+                           }
                    }
-               }
-           }
 
-         if (c == EOF)
-           {
-             errno = saved_errno;
-             break;
-           }
+                 if (convert)
+                   {
+                     if (next_tab_column < column)
+                       error (EXIT_FAILURE, 0, _("input line is too long"));
 
-         if (convert)
-           {
-             if (c == '\b')
+                     if (c == '\t')
+                       {
+                         column = next_tab_column;
+
+                         /* Discard pending blanks, unless it was a single
+                            blank just before the previous tab stop.  */
+                         if (! (pending == 1 && one_blank_before_tab_stop))
+                           {
+                             pending = 0;
+                             one_blank_before_tab_stop = false;
+                           }
+                       }
+                     else
+                       {
+                         column++;
+
+                         if (! (prev_blank && column == next_tab_column))
+                           {
+                             /* It is not yet known whether the pending blanks
+                                will be replaced by tabs.  */
+                             if (column == next_tab_column)
+                               one_blank_before_tab_stop = true;
+                             pending_blank[pending++] = c;
+                             prev_blank = true;
+                             continue;
+                           }
+
+                         /* Replace the pending blanks by a tab or two.  */
+                         pending_blank[0] = c = '\t';
+                         pending = one_blank_before_tab_stop;
+                       }
+                   }
+               }
+             else if (c == '\b')
                {
-                 if (column > 0)
-                   --column;
+                 /* Go back one column, and force recalculation of the
+                    next tab stop.  */
+                 column -= !!column;
+                 next_tab_column = column;
+                 tab_index -= !!tab_index;
                }
              else
                {
-                 ++column;
-                 convert &= convert_entire_line;
+                 column++;
+                 if (!column)
+                   error (EXIT_FAILURE, 0, _("input line is too long"));
                }
-           }
 
-         putchar (c);
+             if (pending)
+               {
+                 if (fwrite (pending_blank, 1, pending, stdout) != pending)
+                   error (EXIT_FAILURE, errno, _("write error"));
+                 pending = 0;
+                 one_blank_before_tab_stop = false;
+               }
+             
+             prev_blank = blank;
+             convert &= convert_entire_line | blank;
+           }
 
-         if (c == '\n')
+         if (c < 0)
            {
-             tab_index = print_tab_index = 0;
-             column = pending = 0;
-             convert = true;
+             free (pending_blank);
+             return;
            }
+
+         if (putchar (c) < 0)
+           error (EXIT_FAILURE, errno, _("write error"));
        }
+      while (c != '\n');
     }
 }
 
@@ -435,7 +468,7 @@ main (int argc, char **argv)
   int c;
 
   /* If true, cancel the effect of any -a (explicit or implicit in -t),
-     so that only leading white space will be considered.  */
+     so that only leading blanks will be considered.  */
   bool convert_first_only = false;
 
   bool obsolete_tablist = false;
@@ -469,14 +502,14 @@ main (int argc, char **argv)
          break;
        case 't':
          convert_entire_line = true;
-         parse_tabstops (optarg);
+         parse_tab_stops (optarg);
          break;
        case CONVERT_FIRST_ONLY_OPTION:
          convert_first_only = true;
          break;
        case ',':
          if (have_tabval)
-           add_tabstop (tabval);
+           add_tab_stop (tabval);
          have_tabval = false;
          obsolete_tablist = true;
          break;
@@ -505,26 +538,22 @@ main (int argc, char **argv)
     convert_entire_line = false;
 
   if (have_tabval)
-    add_tabstop (tabval);
+    add_tab_stop (tabval);
 
-  validate_tabstops (tab_list, first_free_tab);
+  validate_tab_stops (tab_list, first_free_tab);
 
   if (first_free_tab == 0)
-    tab_size = 8;
+    tab_size = max_column_width = 8;
   else if (first_free_tab == 1)
     tab_size = tab_list[0];
   else
-    {
-      /* Append a sentinel to the list of tab stop indices.  */
-      add_tabstop (TAB_STOP_SENTINEL);
-      tab_size = 0;
-    }
+    tab_size = 0;
 
   file_list = (optind < argc ? &argv[optind] : stdin_argv);
 
   unexpand ();
 
-  if (have_read_stdin && fclose (stdin) == EOF)
+  if (have_read_stdin && fclose (stdin) != 0)
     error (EXIT_FAILURE, errno, "-");
 
   exit (exit_status);
Index: tests/unexpand/basic-1
===================================================================
RCS file: /home/eggert/coreutils/cu/tests/unexpand/basic-1,v
retrieving revision 1.14
diff -p -u -r1.14 basic-1
--- tests/unexpand/basic-1      8 Apr 2003 10:55:02 -0000       1.14
+++ tests/unexpand/basic-1      24 Aug 2004 06:42:47 -0000
@@ -41,13 +41,9 @@ my @Tests =
      ['b-1', '-t', '2,4', {IN=> "      ."}, {OUT=>"\t\t  ."}],
      # These would infloop prior to textutils-2.0d.
 
-     # Solaris' /bin/unexpand does this:
-     # ['infloop-1', '-t', '1,2', {IN=> " \t\t .\n"}, {OUT=>" \t\t .\n"}],
-     # FIXME: find out which is required
-
      ['infloop-1', '-t', '1,2', {IN=> " \t\t .\n"}, {OUT=>"\t\t\t .\n"}],
      ['infloop-2', '-t', '4,5', {IN=> ' 'x4 . "\t\t \n"}, {OUT=>"\t\t\t \n"}],
-     ['infloop-3', '-t', '2,3', {IN=> "x \t\t \n"}, {OUT=>"x\t\t\t \n"}],
+     ['infloop-3', '-t', '2,3', {IN=> "x \t\t \n"}, {OUT=>"x \t\t \n"}],
      ['infloop-4', '-t', '1,2', {IN=> " \t\t   \n"}, {OUT=>"\t\t\t   \n"}],
      ['c-1', '-t', '1,2', {IN=> "x\t\t .\n"}, {OUT=>"x\t\t .\n"}],
 
@@ -55,6 +51,21 @@ my @Tests =
      # Feature addition (--first-only) prompted by a report from Jie Xu.
      ['tabs-1', qw(-t 3),              {IN=> "   a  b\n"}, {OUT=>"\ta\tb\n"}],
      ['tabs-2', qw(-t 3 --first-only), {IN=> "   a  b\n"}, {OUT=>"\ta  b\n"}],
+
+     # blanks
+     ['blanks-1', qw(-t 1), {IN=> " b  c   d\n"}, {OUT=> "\tb\t\tc\t\t\td\n"}],
+     ['blanks-2', qw(-t 1), {IN=> "a \n"}, {OUT=> "a \n"}],
+     ['blanks-3', qw(-t 1), {IN=> "a  \n"}, {OUT=> "a\t\t\n"}],
+     ['blanks-4', qw(-t 1), {IN=> "a   \n"}, {OUT=> "a\t\t\t\n"}],
+     ['blanks-5', qw(-t 1), {IN=> "a "}, {OUT=> "a "}],
+     ['blanks-6', qw(-t 1), {IN=> "a  "}, {OUT=> "a\t\t"}],
+     ['blanks-7', qw(-t 1), {IN=> "a   "}, {OUT=> "a\t\t\t"}],
+     ['blanks-8', qw(-t 1), {IN=> " a a  a\n"}, {OUT=> "\ta a\t\ta\n"}],
+     ['blanks-9', qw(-t 2), {IN=> "   a  a  a\n"}, {OUT=> "\t a\ta  a\n"}],
+     ['blanks-10', '-t', '3,4', {IN=> "0 2 4 6\t8\n"}, {OUT=> "0 2 4 6\t8\n"}],
+     ['blanks-11', '-t', '3,4', {IN=> "    4\n"}, {OUT=> "\t\t4\n"}],
+     ['blanks-12', '-t', '3,4', {IN=> "01  4\n"}, {OUT=> "01\t\t4\n"}],
+     ['blanks-13', '-t', '3,4', {IN=> "0   4\n"}, {OUT=> "0\t\t4\n"}],
     );
 
 my $save_temps = $ENV{DEBUG};
Index: doc/coreutils.texi
===================================================================
RCS file: /home/eggert/coreutils/cu/doc/coreutils.texi,v
retrieving revision 1.201
diff -p -u -r1.201 coreutils.texi
--- doc/coreutils.texi  19 Aug 2004 20:05:52 -0000      1.201
+++ doc/coreutils.texi  24 Aug 2004 06:56:26 -0000
@@ -5090,15 +5090,15 @@ The program accepts the following option
 @itemx address@hidden,@address@hidden
 @opindex -t
 @opindex --tabs
address@hidden tabstops, setting
address@hidden tab stops, setting
 If only one tab stop is given, set the tabs @var{tab1} spaces apart
 (default is 8).  Otherwise, set the tabs at columns @var{tab1},
 @var{tab2}, @dots{} (numbered from 0), and replace any tabs beyond the
-last tabstop given with single spaces.  Tabstops can be separated by
+last tab stop given with single spaces.  Tab stops can be separated by
 blanks as well as by commas.
 
 On older systems, @command{expand} supports an obsolete option
address@hidden@var{tab1}[,@address@hidden, where tabstops must be
address@hidden@var{tab1}[,@address@hidden, where tab stops must be
 separated by commas.  @acronym{POSIX} 1003.1-2001 (@pxref{Standards
 conformance}) does not allow this; use @option{-t
 @var{tab1}[,@address@hidden instead.
@@ -5123,16 +5123,17 @@ characters) on each line to spaces.
 
 @command{unexpand} writes the contents of each given @var{file}, or
 standard input if none are given or for a @var{file} of @samp{-}, to
-standard output, with strings of two or more space or tab characters
-converted to as many tabs as possible followed by as many spaces as are
-needed.  Synopsis:
+standard output, converting blanks at the beginning of each line into
+as many tab characters as needed.  In the default @acronym{POSIX}
+locale, a @dfn{blank} is a space or a tab; other locales may specify
+additional blank characters.  Synopsis:
 
 @example
 unexpand address@hidden@dots{} address@hidden@dots{}
 @end example
 
-By default, @command{unexpand} converts only initial spaces and tabs (those
-that precede all non space or tab characters) on each line.  It
+By default, @command{unexpand} converts only initial blanks (those
+that precede all non-blank characters) on each line.  It
 preserves backspace characters in the output; they decrement the column
 count for tab calculations.  By default, tabs are set at every 8th
 column.
@@ -5145,14 +5146,14 @@ The program accepts the following option
 @itemx address@hidden,@address@hidden
 @opindex -t
 @opindex --tabs
-If only one tab stop is given, set the tabs @var{tab1} spaces apart
+If only one tab stop is given, set the tabs @var{tab1} columns apart
 instead of the default 8.  Otherwise, set the tabs at columns
address@hidden, @var{tab2}, @dots{} (numbered from 0), and leave spaces and
-tabs beyond the tabstops given unchanged.  Tabstops can be separated by
address@hidden, @var{tab2}, @dots{} (numbered from 0), and leave blanks
+beyond the tab stops given unchanged.  Tab stops can be separated by
 blanks as well as by commas.  This option implies the @option{-a} option.
 
 On older systems, @command{unexpand} supports an obsolete option
address@hidden@var{tab1}[,@address@hidden, where tabstops must be
address@hidden@var{tab1}[,@address@hidden, where tab stops must be
 separated by commas.  (Unlike @option{-t}, this obsolete option does
 not imply @option{-a}.)  @acronym{POSIX} 1003.1-2001 (@pxref{Standards
 conformance}) does not allow this; use @option{--first-only -t
@@ -5162,8 +5163,8 @@ conformance}) does not allow this; use @
 @itemx --all
 @opindex -a
 @opindex --all
-Convert all strings of two or more spaces or tabs, not just initial
-ones, to tabs.
+Also convert all sequences of two or more blanks just before a tab stop.
+even if they occur after non-blank characters in a line.
 
 @end table
 
@@ -5832,7 +5833,7 @@ List the files in columns, sorted horizo
 @itemx address@hidden
 @opindex -T
 @opindex --tabsize
-Assume that each tabstop is @var{cols} columns wide.  The default is 8.
+Assume that each tab stop is @var{cols} columns wide.  The default is 8.
 @command{ls} uses tabs where possible in the output, for efficiency.  If
 @var{cols} is zero, do not use tabs at all.
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]