bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Implement join --check-order and --nocheck-order.


From: Jim Meyering
Subject: Re: [PATCH] Implement join --check-order and --nocheck-order.
Date: Sun, 23 Mar 2008 12:39:45 +0100

"Dmitry V. Levin" <address@hidden> wrote:
> On Tue, Feb 19, 2008 at 03:44:00PM +0100, Jim Meyering wrote:
>> James Youngman <address@hidden> wrote:
>> > This is a consolidated patch including all the previous changes,
>> > implementing order checking in join by default.  "make distcheck"
>> > works (if I move distcheck-hook from Makefile.am to GNUmakefile).
>> >
>> > 2008-02-16  James Youngman  <address@hidden>
>> >
>> >    * src/join.c: Support --check-order and --nocheck-order.
>> >    New variables check_input_order, seen_unpairable and
>> >    issued_disorder_warning[]. For --check-order, verify that the
>> ...
>>
>> Thanks again.
>> I've just pushed that.
>
> Looks like this commit v6.10-70-ga1e7156 introduces a regression:
>
> $ printf '%s\n%s\n' '1 b' '2 a' | sort -k2,2 | join -12 -21 - /dev/null
> join: File 1 is not in sorted order
> $ echo $?
> 1
> $ join --version |grep join
> join (GNU coreutils) 6.10.133-677610

Thank you for catching that.
The new check_order function uses keycmp in a new way: with both
lines coming from the same file, so that that function's hard-coded
use of join_field_1 and join_field_2 globals was not appropriate.

Here's a bare, knee-jerk patch: no NEWS,
no comment adjustments, no new tests.  Yet.

Now, I'm hoping someone will find time to do a little testing to see
whether the new feature (esp. with this additional change or something
equivalent) impacts performance.

diff --git a/src/join.c b/src/join.c
index 8f0d436..aaf66b0 100644
--- a/src/join.c
+++ b/src/join.c
@@ -300,7 +300,8 @@ freeline (struct line *line)
    Report an error and exit if the comparison fails.  */

 static int
-keycmp (struct line const *line1, struct line const *line2)
+keycmp (struct line const *line1, struct line const *line2,
+       size_t jf_1, size_t jf_2)
 {
   /* Start of field to compare in each file.  */
   char *beg1;
@@ -310,10 +311,10 @@ keycmp (struct line const *line1, struct line const 
*line2)
   size_t len2;         /* Length of fields to compare.  */
   int diff;

-  if (join_field_1 < line1->nfields)
+  if (jf_1 < line1->nfields)
     {
-      beg1 = line1->fields[join_field_1].beg;
-      len1 = line1->fields[join_field_1].len;
+      beg1 = line1->fields[jf_1].beg;
+      len1 = line1->fields[jf_1].len;
     }
   else
     {
@@ -321,10 +322,10 @@ keycmp (struct line const *line1, struct line const 
*line2)
       len1 = 0;
     }

-  if (join_field_2 < line2->nfields)
+  if (jf_2 < line2->nfields)
     {
-      beg2 = line2->fields[join_field_2].beg;
-      len2 = line2->fields[join_field_2].len;
+      beg2 = line2->fields[jf_2].beg;
+      len2 = line2->fields[jf_2].len;
     }
   else
     {
@@ -376,7 +377,8 @@ check_order (const struct line *prev,
     {
       if (!issued_disorder_warning[whatfile-1])
        {
-         if (keycmp (prev, current) > 0)
+         size_t join_field = whatfile == 1 ? join_field_1 : join_field_2;
+         if (keycmp (prev, current, join_field, join_field) > 0)
            {
              error ((check_input_order == CHECK_ORDER_ENABLED
                      ? EXIT_FAILURE : 0),
@@ -404,6 +406,7 @@ get_line (FILE *fp, struct line *line, int which)
        error (EXIT_FAILURE, errno, _("read error"));
       free (line->buf.buffer);
       line->buf.buffer = NULL;
+      // prevline[which - 1] = NULL;
       return false;
     }

@@ -605,7 +608,8 @@ join (FILE *fp1, FILE *fp2)
   while (seq1.count && seq2.count)
     {
       size_t i;
-      diff = keycmp (&seq1.lines[0], &seq2.lines[0]);
+      diff = keycmp (&seq1.lines[0], &seq2.lines[0],
+                    join_field_1, join_field_2);
       if (diff < 0)
        {
          if (print_unpairables_1)
@@ -633,7 +637,8 @@ join (FILE *fp1, FILE *fp2)
            ++seq1.count;
            break;
          }
-      while (!keycmp (&seq1.lines[seq1.count - 1], &seq2.lines[0]));
+      while (!keycmp (&seq1.lines[seq1.count - 1], &seq2.lines[0],
+                     join_field_1, join_field_2));

       /* Keep reading lines from file2 as long as they continue to
          match the current line from file1.  */
@@ -645,7 +650,8 @@ join (FILE *fp1, FILE *fp2)
            ++seq2.count;
            break;
          }
-      while (!keycmp (&seq1.lines[0], &seq2.lines[seq2.count - 1]));
+      while (!keycmp (&seq1.lines[0], &seq2.lines[seq2.count - 1],
+                     join_field_1, join_field_2));

       if (print_pairables)
        {
--
1.5.5.rc0.21.g740fd




reply via email to

[Prev in Thread] Current Thread [Next in Thread]