bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#12966: cut: Problems with overlapping, open-ended ranges


From: Jim Meyering
Subject: bug#12966: cut: Problems with overlapping, open-ended ranges
Date: Sat, 24 Nov 2012 20:40:21 +0100

Marcel Böhme wrote:
>    I found two (semantically related) bugs. One seems to originate in the
>    first version. For research purposes, I would appreciate if you could
>    confirm that the second was introduced with Coreutils 5.3.0.
>    1) The following bug seems to exists "since the beginning".

As you saw, I've posted a patch for that separately.

>    2) Can you kindly confirm that the following bug has been introduced
>    with Coreutils 5.3.0, particularly commit
>    http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=7380cf79
>    2aa35b9328519c5f374036d5260704cb ?
>    $echo 1234567890 | ./cut -b 2-,3,4-4,5 --output-delimiter="."
>    2.34.567890
>    $echo 1234567890 | ./cut -b 2-10,3,4-4,5 --output-delimiter="."
>    234567890

Yes, that's another bug.
Thanks again.  Here's a proposed patch:

>From 7fce8c5bd9a916b65895b159caa6632b076fc634 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Sat, 24 Nov 2012 11:36:15 -0800
Subject: [PATCH] cut: do not print extraneous delimiters in some unusual cases
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When printing output delimiters, and when a to-EOL range subsumes
at least one other range, cut would mistakenly print delimiters for
the subsumed range.  This bug was probably introduced via commit
v5.2.1-639-g847e066.
* src/cut.c (set_fields): Ignore any range that is subsumed by a
to-EOL range.  Also, move two declarations down.
* tests/misc/cut.pl: Add test to exercise this.
* NEWS (Bug fixes): Mention it.
Reported by Marcel Böhme in http://bugs.gnu.org/12966
---
 NEWS              | 4 ++++
 src/cut.c         | 9 +++++----
 tests/misc/cut.pl | 9 +++++++++
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/NEWS b/NEWS
index c46be61..9d0f2cf 100644
--- a/NEWS
+++ b/NEWS
@@ -18,6 +18,10 @@ GNU coreutils NEWS                                    -*- 
outline -*-
   it would interpret "-b2-,3-" like "-b3-".  Now it's treated like "-b2-".
   [This bug was present in "the beginning".]

+  cut no longer prints extraneous delimiters when a to-EOL range subsumes
+  another range.  Before, "echo 123|cut --output-delim=: -b2-,3" would print
+  "2:3".  Now it prints "23".  [bug introduced in 5.3.0]
+
   install -m M SOURCE DEST no longer has a race condition where DEST's
   permissions are temporarily derived from SOURCE instead of from M.

diff --git a/src/cut.c b/src/cut.c
index b464840..4219d24 100644
--- a/src/cut.c
+++ b/src/cut.c
@@ -514,17 +514,18 @@ set_fields (const char *fieldstr)
   /* Set the array entries corresponding to integers in the ranges of RP.  */
   for (i = 0; i < n_rp; i++)
     {
-      size_t j;
-      size_t rsi_candidate;
+      /* Ignore any range that is subsumed by the to-EOL range.  */
+      if (eol_range_start && eol_range_start <= rp[i].lo)
+        continue;

       /* Record the range-start indices, i.e., record each start
          index that is not part of any other (lo..hi] range.  */
-      rsi_candidate = complement ? rp[i].hi + 1 : rp[i].lo;
+      size_t rsi_candidate = complement ? rp[i].hi + 1 : rp[i].lo;
       if (output_delimiter_specified
           && !is_printable_field (rsi_candidate))
         mark_range_start (rsi_candidate);

-      for (j = rp[i].lo; j <= rp[i].hi; j++)
+      for (size_t j = rp[i].lo; j <= rp[i].hi; j++)
         mark_printable_field (j);
     }

diff --git a/tests/misc/cut.pl b/tests/misc/cut.pl
index cb4781a..27768ff 100755
--- a/tests/misc/cut.pl
+++ b/tests/misc/cut.pl
@@ -166,6 +166,15 @@ my @Tests =

   ['overlapping-unbounded-1', '-b3-,2-', {IN=>"1234\n"}, {OUT=>"234\n"}],
   ['overlapping-unbounded-2', '-b2-,3-', {IN=>"1234\n"}, {OUT=>"234\n"}],
+
+  # When printing output delimiters, and with one or more ranges subsumed
+  # by a to-EOL range, cut 8.20 and earlier would print extraneous delimiters.
+  ['EOL-subsumed-1', '--output-d=: -b2-,3,4-4,5',
+                                         {IN=>"123456\n"}, {OUT=>"23456\n"}],
+  ['EOL-subsumed-2', '--output-d=: -b3,4-4,5,2-',
+                                         {IN=>"123456\n"}, {OUT=>"23456\n"}],
+  ['EOL-subsumed-3', '--complement -b3,4-4,5,2-',
+                                         {IN=>"123456\n"}, {OUT=>"1\n"}],
  );

 if ($mb_locale ne 'C')
--
1.8.0.273.g2d242fb





reply via email to

[Prev in Thread] Current Thread [Next in Thread]