bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: have cut -c print output-delimiter


From: Jim Meyering
Subject: Re: have cut -c print output-delimiter
Date: Thu, 09 Jan 2003 21:29:20 +0100

Jan Nieuwenhuizen <address@hidden> wrote:
> Find a patch below that makes cut print output-delimiter when working
> on byte ranges, but only if output-delimiter was explicitely specified
> on the command line.
>
>>From cut --help, one would already expect cut to handle this case,
> which is very useful to (pre)process fixed-width text tables.
>
> Greetings,
> Jan.
>
> PS: I checked subversions.gnu.org:cvsroot/textutils and
>     :cvsroot/coreutils first and found both modules to exist, but
>     empty.  A README redirecting the user to alpha.gnu.org would be
>     most friendly.
>
> Example:
>     ls -l | cut --output-delimiter=, -c1,2-4,5-7,8-10,57- > foo
>     mysql -e 'create table foo (d char(1),u char(3), g varchar (3), o \
>         varchar (3), n text)' test
>     mysqlimport --fields-terminated-by=, test foo

Thanks a lot for the suggestion and patch.
I've made some changes and added some tests.

        When selecting ranges of byte offsets (as opposed to ranges of fields)
        and when --output-delimiter=STRING is specified, output STRING between
        ranges of selected bytes.
        * src/cut.c (RANGE_START_SENTINEL): Define.
        (output_delimiter_specified): New global.
        (print_kth): Add parameter.  Adjust all callers.
        (set_fields): Mark each range-start index with RANGE_START_SENTINEL.
        (cut_bytes): When requested, output STRING between ranges of
        selected bytes.
        (main): Make a diagnostic a little clearer.
        Based on a patch from Jan Nieuwenhuizen.

        * tests/cut/Test.pm: New tests for the above.

        * src/cut.c (set_fields): Make code agree with comment:
        Don't merge abutting ranges like 4- and 2-3.  This makes no
        difference currently, but is required to support an upcoming change.

There were two minor problems: imho, the following should
(and now does) output a `:'.

  pi$ echo abcdefghi|./cut -c4-,2-3 --output-d=:
  bc:defghi

Before the additional little change (patch included below),
it did this:

  pi$ echo abcdefghi|./cut -c4-,2-3 --output-d=:
  bcdefghi

Also, before, overlapping byte ranges could result in questionable output.

The only part remaining is to update the texinfo documentation
with a description of the new feature.  Would you like to do that,
including the nice example you gave above?

Thanks again,
Jim

Index: cut.c
===================================================================
RCS file: /fetish/cu/src/cut.c,v
retrieving revision 1.85
retrieving revision 1.86
diff -u -p -u -r1.85 -r1.86
--- cut.c       9 Jan 2003 19:30:22 -0000       1.85
+++ cut.c       9 Jan 2003 20:16:58 -0000       1.86
@@ -93,6 +93,10 @@ static unsigned int max_range_endpoint;
    to end of line. */
 static unsigned int eol_range_start;
 
+/* A nonzero, non-1 value with which to distinguish the index
+   corresponding to the lower bound of a range.  */
+#define RANGE_START_SENTINEL 2
+
 /* In byte mode, which bytes to output.
    In field mode, which DELIM-separated fields to output.
    Both bytes and fields are numbered starting with 1,
@@ -126,6 +130,9 @@ static int suppress_non_delimited;
 /* The delimeter character for field mode. */
 static int delim;
 
+/* Nonzero if the --output-delimiter=STRING option was specified.  */
+static int output_delimiter_specified;
+
 /* The length of output_delimiter_string.  */
 static size_t output_delimiter_length;
 
@@ -210,11 +217,29 @@ With no FILE, or when FILE is -, read st
   exit (status == 0 ? EXIT_SUCCESS : EXIT_FAILURE);
 }
 
+/* Return nonzero if the K'th field or byte is printable.
+   When returning nonzero, if RANGE_START is non-NULL,
+   set *RANGE_START to nonzero if K is the beginning of a range, and
+   set *RANGE_START to zero if K is not the beginning of a range.  */
+
 static int
-print_kth (unsigned int k)
+print_kth (unsigned int k, int *range_start)
 {
-  return ((0 < eol_range_start && eol_range_start <= k)
-         || (k <= max_range_endpoint && printable_field[k]));
+  if (0 < eol_range_start && eol_range_start <= k)
+    {
+      if (range_start)
+       *range_start = (k == eol_range_start);
+      return 1;
+    }
+
+  if (k <= max_range_endpoint && printable_field[k])
+    {
+      if (range_start)
+       *range_start = (printable_field[k] == RANGE_START_SENTINEL);
+      return 1;
+    }
+
+  return 0;
 }
 
 /* Given the list of field or byte range specifications FIELDSTR, set
@@ -371,8 +396,13 @@ set_fields (const char *fieldstr)
   /* Set the array entries corresponding to integers in the ranges of RP.  */
   for (i = 0; i < n_rp; i++)
     {
-      unsigned int j;
-      for (j = rp[i].lo; j <= rp[i].hi; j++)
+      unsigned int j = rp[i].lo;
+
+      /* Mark the first position of field or range with a sentinel,
+        but not if it's already part of another range.  */
+      if (j <= rp[i].hi && ! printable_field[j])
+       printable_field[j] = RANGE_START_SENTINEL;
+      for (++j; j <= rp[i].hi; j++)
        {
          printable_field[j] = 1;
        }
@@ -388,9 +418,13 @@ set_fields (const char *fieldstr)
 static void
 cut_bytes (FILE *stream)
 {
-  unsigned int byte_idx;       /* Number of chars in the line so far. */
+  unsigned int byte_idx;       /* Number of bytes in the line so far. */
+  /* Whether to begin printing delimiters between ranges for the current line.
+     Set after we've begun printing data corresponding to the first range.  */
+  int print_delimiter;
 
   byte_idx = 0;
+  print_delimiter = 0;
   while (1)
     {
       register int c;          /* Each character from the file. */
@@ -401,6 +435,7 @@ cut_bytes (FILE *stream)
        {
          putchar ('\n');
          byte_idx = 0;
+         print_delimiter = 0;
        }
       else if (c == EOF)
        {
@@ -410,9 +445,15 @@ cut_bytes (FILE *stream)
        }
       else
        {
-         ++byte_idx;
-         if (print_kth (byte_idx))
+         int range_start;
+         if (print_kth (++byte_idx, &range_start))
            {
+             if (range_start && print_delimiter && output_delimiter_specified)
+               {
+                 fwrite (output_delimiter_string, sizeof (char),
+                         output_delimiter_length, stdout);
+               }
+             print_delimiter = 1;
              putchar (c);
            }
        }
@@ -444,7 +485,7 @@ cut_fields (FILE *stream)
      and the first field has been selected, or if non-delimited lines
      must be suppressed and the first field has *not* been selected.
      That is because a non-delimited line has exactly one field.  */
-  buffer_first_field = (suppress_non_delimited ^ !print_kth (1));
+  buffer_first_field = (suppress_non_delimited ^ !print_kth (1, NULL));
 
   while (1)
     {
@@ -483,7 +524,7 @@ cut_fields (FILE *stream)
                }
              continue;
            }
-         if (print_kth (1))
+         if (print_kth (1, NULL))
            {
              /* Print the field, but not the trailing delimiter.  */
              fwrite (field_1_buffer, sizeof (char), n_bytes - 1, stdout);
@@ -494,7 +535,7 @@ cut_fields (FILE *stream)
 
       if (c != EOF)
        {
-         if (print_kth (field_idx))
+         if (print_kth (field_idx, NULL))
            {
              if (found_any_selected_field)
                {
@@ -648,6 +689,7 @@ main (int argc, char **argv)
          break;
 
        case OUTPUT_DELIMITER_OPTION:
+         output_delimiter_specified = 1;
          /* Interpret --output-delimiter='' to mean
             `use the NUL byte as the delimiter.'  */
          output_delimiter_length = (optarg[0] == '\0'
@@ -675,7 +717,8 @@ main (int argc, char **argv)
     FATAL_ERROR (_("you must specify a list of bytes, characters, or fields"));
 
   if (delim != '\0' && operating_mode != field_mode)
-    FATAL_ERROR (_("a delimiter may be specified only when operating on 
fields"));
+    FATAL_ERROR (_("an input delimiter may be specified only\
+ when operating on fields"));
 
   if (suppress_non_delimited && operating_mode != field_mode)
     FATAL_ERROR (_("suppressing non-delimited lines makes sense\n\

================================
Here's the preceding change:

date: 2003/01/09 19:30:22;  author: meyering;  state: Exp;  lines: +1 -1
(set_fields): Make code agree with comment:
Don't merge abutting ranges like 4- and 2-3.  This makes no
difference currently, but is required to support an upcoming change.
=============================================================================
Index: cut.c
===================================================================
RCS file: /fetish/cu/src/cut.c,v
retrieving revision 1.84
retrieving revision 1.85
diff -u -p -u -r1.84 -r1.85
--- cut.c       7 Jan 2003 17:12:11 -0000       1.84
+++ cut.c       9 Jan 2003 19:30:22 -0000       1.85
@@ -304,7 +304,7 @@ set_fields (const char *fieldstr)
                          /* No, the new sequence starts before the
                             old.  Does the old range going to end of line
                             extend into the new range?  */
-                         if (value + 1 >= eol_range_start)
+                         if (eol_range_start <= value)
                            {
                              /* Yes.  Simply move the end of line marker. */
                              eol_range_start = initial;

===========================

Index: Test.pm
===================================================================
RCS file: /fetish/cu/tests/cut/Test.pm,v
retrieving revision 1.10
diff -u -p -u -p -r1.10 Test.pm
--- Test.pm     7 Mar 1999 05:10:32 -0000       1.10
+++ Test.pm     9 Jan 2003 18:00:43 -0000
@@ -72,6 +72,17 @@ my @tv = (
 # Prior to 1.22i, you couldn't use a delimiter that would sign-extend.
 ['8bit-delim', "'-d\255' -f2,3 --out=_", "a\255b\255c\n", "b_c\n",     0],
 
+# New functionality:
+['out-delim1', '-c1-3,5- --output-d=:', "abcdefg\n", "abc:efg\n",      0],
+# A totally overlapped field shouldn't change anything:
+['out-delim2', '-c1-3,2,5- --output-d=:', "abcdefg\n", "abc:efg\n",    0],
+# Partial overlap: index `2' is not at the start of a range.
+['out-delim3', '-c1-3,2-4,6 --output-d=:', "abcdefg\n", "abcd:f\n",    0],
+# Ensure that the following two commands produce the same output.
+# Before an off-by-one fix, the output from the former would not contain a `:'.
+['out-delim4', '-c4-,2-3 --output-d=:', "abcdefg\n", "bc:defg\n",      0],
+['out-delim5', '-c2-3,4- --output-d=:', "abcdefg\n", "bc:defg\n",      0],
+
 );
 
 # Don't use a pipe for failing tests.  Otherwise, sometimes they




reply via email to

[Prev in Thread] Current Thread [Next in Thread]