[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: have cut -c print output-delimiter
From: |
Jim Meyering |
Subject: |
Re: have cut -c print output-delimiter |
Date: |
Thu, 09 Jan 2003 21:29:20 +0100 |
Jan Nieuwenhuizen <address@hidden> wrote:
> Find a patch below that makes cut print output-delimiter when working
> on byte ranges, but only if output-delimiter was explicitely specified
> on the command line.
>
>>From cut --help, one would already expect cut to handle this case,
> which is very useful to (pre)process fixed-width text tables.
>
> Greetings,
> Jan.
>
> PS: I checked subversions.gnu.org:cvsroot/textutils and
> :cvsroot/coreutils first and found both modules to exist, but
> empty. A README redirecting the user to alpha.gnu.org would be
> most friendly.
>
> Example:
> ls -l | cut --output-delimiter=, -c1,2-4,5-7,8-10,57- > foo
> mysql -e 'create table foo (d char(1),u char(3), g varchar (3), o \
> varchar (3), n text)' test
> mysqlimport --fields-terminated-by=, test foo
Thanks a lot for the suggestion and patch.
I've made some changes and added some tests.
When selecting ranges of byte offsets (as opposed to ranges of fields)
and when --output-delimiter=STRING is specified, output STRING between
ranges of selected bytes.
* src/cut.c (RANGE_START_SENTINEL): Define.
(output_delimiter_specified): New global.
(print_kth): Add parameter. Adjust all callers.
(set_fields): Mark each range-start index with RANGE_START_SENTINEL.
(cut_bytes): When requested, output STRING between ranges of
selected bytes.
(main): Make a diagnostic a little clearer.
Based on a patch from Jan Nieuwenhuizen.
* tests/cut/Test.pm: New tests for the above.
* src/cut.c (set_fields): Make code agree with comment:
Don't merge abutting ranges like 4- and 2-3. This makes no
difference currently, but is required to support an upcoming change.
There were two minor problems: imho, the following should
(and now does) output a `:'.
pi$ echo abcdefghi|./cut -c4-,2-3 --output-d=:
bc:defghi
Before the additional little change (patch included below),
it did this:
pi$ echo abcdefghi|./cut -c4-,2-3 --output-d=:
bcdefghi
Also, before, overlapping byte ranges could result in questionable output.
The only part remaining is to update the texinfo documentation
with a description of the new feature. Would you like to do that,
including the nice example you gave above?
Thanks again,
Jim
Index: cut.c
===================================================================
RCS file: /fetish/cu/src/cut.c,v
retrieving revision 1.85
retrieving revision 1.86
diff -u -p -u -r1.85 -r1.86
--- cut.c 9 Jan 2003 19:30:22 -0000 1.85
+++ cut.c 9 Jan 2003 20:16:58 -0000 1.86
@@ -93,6 +93,10 @@ static unsigned int max_range_endpoint;
to end of line. */
static unsigned int eol_range_start;
+/* A nonzero, non-1 value with which to distinguish the index
+ corresponding to the lower bound of a range. */
+#define RANGE_START_SENTINEL 2
+
/* In byte mode, which bytes to output.
In field mode, which DELIM-separated fields to output.
Both bytes and fields are numbered starting with 1,
@@ -126,6 +130,9 @@ static int suppress_non_delimited;
/* The delimeter character for field mode. */
static int delim;
+/* Nonzero if the --output-delimiter=STRING option was specified. */
+static int output_delimiter_specified;
+
/* The length of output_delimiter_string. */
static size_t output_delimiter_length;
@@ -210,11 +217,29 @@ With no FILE, or when FILE is -, read st
exit (status == 0 ? EXIT_SUCCESS : EXIT_FAILURE);
}
+/* Return nonzero if the K'th field or byte is printable.
+ When returning nonzero, if RANGE_START is non-NULL,
+ set *RANGE_START to nonzero if K is the beginning of a range, and
+ set *RANGE_START to zero if K is not the beginning of a range. */
+
static int
-print_kth (unsigned int k)
+print_kth (unsigned int k, int *range_start)
{
- return ((0 < eol_range_start && eol_range_start <= k)
- || (k <= max_range_endpoint && printable_field[k]));
+ if (0 < eol_range_start && eol_range_start <= k)
+ {
+ if (range_start)
+ *range_start = (k == eol_range_start);
+ return 1;
+ }
+
+ if (k <= max_range_endpoint && printable_field[k])
+ {
+ if (range_start)
+ *range_start = (printable_field[k] == RANGE_START_SENTINEL);
+ return 1;
+ }
+
+ return 0;
}
/* Given the list of field or byte range specifications FIELDSTR, set
@@ -371,8 +396,13 @@ set_fields (const char *fieldstr)
/* Set the array entries corresponding to integers in the ranges of RP. */
for (i = 0; i < n_rp; i++)
{
- unsigned int j;
- for (j = rp[i].lo; j <= rp[i].hi; j++)
+ unsigned int j = rp[i].lo;
+
+ /* Mark the first position of field or range with a sentinel,
+ but not if it's already part of another range. */
+ if (j <= rp[i].hi && ! printable_field[j])
+ printable_field[j] = RANGE_START_SENTINEL;
+ for (++j; j <= rp[i].hi; j++)
{
printable_field[j] = 1;
}
@@ -388,9 +418,13 @@ set_fields (const char *fieldstr)
static void
cut_bytes (FILE *stream)
{
- unsigned int byte_idx; /* Number of chars in the line so far. */
+ unsigned int byte_idx; /* Number of bytes in the line so far. */
+ /* Whether to begin printing delimiters between ranges for the current line.
+ Set after we've begun printing data corresponding to the first range. */
+ int print_delimiter;
byte_idx = 0;
+ print_delimiter = 0;
while (1)
{
register int c; /* Each character from the file. */
@@ -401,6 +435,7 @@ cut_bytes (FILE *stream)
{
putchar ('\n');
byte_idx = 0;
+ print_delimiter = 0;
}
else if (c == EOF)
{
@@ -410,9 +445,15 @@ cut_bytes (FILE *stream)
}
else
{
- ++byte_idx;
- if (print_kth (byte_idx))
+ int range_start;
+ if (print_kth (++byte_idx, &range_start))
{
+ if (range_start && print_delimiter && output_delimiter_specified)
+ {
+ fwrite (output_delimiter_string, sizeof (char),
+ output_delimiter_length, stdout);
+ }
+ print_delimiter = 1;
putchar (c);
}
}
@@ -444,7 +485,7 @@ cut_fields (FILE *stream)
and the first field has been selected, or if non-delimited lines
must be suppressed and the first field has *not* been selected.
That is because a non-delimited line has exactly one field. */
- buffer_first_field = (suppress_non_delimited ^ !print_kth (1));
+ buffer_first_field = (suppress_non_delimited ^ !print_kth (1, NULL));
while (1)
{
@@ -483,7 +524,7 @@ cut_fields (FILE *stream)
}
continue;
}
- if (print_kth (1))
+ if (print_kth (1, NULL))
{
/* Print the field, but not the trailing delimiter. */
fwrite (field_1_buffer, sizeof (char), n_bytes - 1, stdout);
@@ -494,7 +535,7 @@ cut_fields (FILE *stream)
if (c != EOF)
{
- if (print_kth (field_idx))
+ if (print_kth (field_idx, NULL))
{
if (found_any_selected_field)
{
@@ -648,6 +689,7 @@ main (int argc, char **argv)
break;
case OUTPUT_DELIMITER_OPTION:
+ output_delimiter_specified = 1;
/* Interpret --output-delimiter='' to mean
`use the NUL byte as the delimiter.' */
output_delimiter_length = (optarg[0] == '\0'
@@ -675,7 +717,8 @@ main (int argc, char **argv)
FATAL_ERROR (_("you must specify a list of bytes, characters, or fields"));
if (delim != '\0' && operating_mode != field_mode)
- FATAL_ERROR (_("a delimiter may be specified only when operating on
fields"));
+ FATAL_ERROR (_("an input delimiter may be specified only\
+ when operating on fields"));
if (suppress_non_delimited && operating_mode != field_mode)
FATAL_ERROR (_("suppressing non-delimited lines makes sense\n\
================================
Here's the preceding change:
date: 2003/01/09 19:30:22; author: meyering; state: Exp; lines: +1 -1
(set_fields): Make code agree with comment:
Don't merge abutting ranges like 4- and 2-3. This makes no
difference currently, but is required to support an upcoming change.
=============================================================================
Index: cut.c
===================================================================
RCS file: /fetish/cu/src/cut.c,v
retrieving revision 1.84
retrieving revision 1.85
diff -u -p -u -r1.84 -r1.85
--- cut.c 7 Jan 2003 17:12:11 -0000 1.84
+++ cut.c 9 Jan 2003 19:30:22 -0000 1.85
@@ -304,7 +304,7 @@ set_fields (const char *fieldstr)
/* No, the new sequence starts before the
old. Does the old range going to end of line
extend into the new range? */
- if (value + 1 >= eol_range_start)
+ if (eol_range_start <= value)
{
/* Yes. Simply move the end of line marker. */
eol_range_start = initial;
===========================
Index: Test.pm
===================================================================
RCS file: /fetish/cu/tests/cut/Test.pm,v
retrieving revision 1.10
diff -u -p -u -p -r1.10 Test.pm
--- Test.pm 7 Mar 1999 05:10:32 -0000 1.10
+++ Test.pm 9 Jan 2003 18:00:43 -0000
@@ -72,6 +72,17 @@ my @tv = (
# Prior to 1.22i, you couldn't use a delimiter that would sign-extend.
['8bit-delim', "'-d\255' -f2,3 --out=_", "a\255b\255c\n", "b_c\n", 0],
+# New functionality:
+['out-delim1', '-c1-3,5- --output-d=:', "abcdefg\n", "abc:efg\n", 0],
+# A totally overlapped field shouldn't change anything:
+['out-delim2', '-c1-3,2,5- --output-d=:', "abcdefg\n", "abc:efg\n", 0],
+# Partial overlap: index `2' is not at the start of a range.
+['out-delim3', '-c1-3,2-4,6 --output-d=:', "abcdefg\n", "abcd:f\n", 0],
+# Ensure that the following two commands produce the same output.
+# Before an off-by-one fix, the output from the former would not contain a `:'.
+['out-delim4', '-c4-,2-3 --output-d=:', "abcdefg\n", "bc:defg\n", 0],
+['out-delim5', '-c2-3,4- --output-d=:', "abcdefg\n", "bc:defg\n", 0],
+
);
# Don't use a pipe for failing tests. Otherwise, sometimes they
- Re: have cut -c print output-delimiter,
Jim Meyering <=