[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Implement join --check-order and --nocheck-order.
From: |
Jim Meyering |
Subject: |
Re: [PATCH] Implement join --check-order and --nocheck-order. |
Date: |
Sun, 23 Mar 2008 12:39:45 +0100 |
"Dmitry V. Levin" <address@hidden> wrote:
> On Tue, Feb 19, 2008 at 03:44:00PM +0100, Jim Meyering wrote:
>> James Youngman <address@hidden> wrote:
>> > This is a consolidated patch including all the previous changes,
>> > implementing order checking in join by default. "make distcheck"
>> > works (if I move distcheck-hook from Makefile.am to GNUmakefile).
>> >
>> > 2008-02-16 James Youngman <address@hidden>
>> >
>> > * src/join.c: Support --check-order and --nocheck-order.
>> > New variables check_input_order, seen_unpairable and
>> > issued_disorder_warning[]. For --check-order, verify that the
>> ...
>>
>> Thanks again.
>> I've just pushed that.
>
> Looks like this commit v6.10-70-ga1e7156 introduces a regression:
>
> $ printf '%s\n%s\n' '1 b' '2 a' | sort -k2,2 | join -12 -21 - /dev/null
> join: File 1 is not in sorted order
> $ echo $?
> 1
> $ join --version |grep join
> join (GNU coreutils) 6.10.133-677610
Thank you for catching that.
The new check_order function uses keycmp in a new way: with both
lines coming from the same file, so that that function's hard-coded
use of join_field_1 and join_field_2 globals was not appropriate.
Here's a bare, knee-jerk patch: no NEWS,
no comment adjustments, no new tests. Yet.
Now, I'm hoping someone will find time to do a little testing to see
whether the new feature (esp. with this additional change or something
equivalent) impacts performance.
diff --git a/src/join.c b/src/join.c
index 8f0d436..aaf66b0 100644
--- a/src/join.c
+++ b/src/join.c
@@ -300,7 +300,8 @@ freeline (struct line *line)
Report an error and exit if the comparison fails. */
static int
-keycmp (struct line const *line1, struct line const *line2)
+keycmp (struct line const *line1, struct line const *line2,
+ size_t jf_1, size_t jf_2)
{
/* Start of field to compare in each file. */
char *beg1;
@@ -310,10 +311,10 @@ keycmp (struct line const *line1, struct line const
*line2)
size_t len2; /* Length of fields to compare. */
int diff;
- if (join_field_1 < line1->nfields)
+ if (jf_1 < line1->nfields)
{
- beg1 = line1->fields[join_field_1].beg;
- len1 = line1->fields[join_field_1].len;
+ beg1 = line1->fields[jf_1].beg;
+ len1 = line1->fields[jf_1].len;
}
else
{
@@ -321,10 +322,10 @@ keycmp (struct line const *line1, struct line const
*line2)
len1 = 0;
}
- if (join_field_2 < line2->nfields)
+ if (jf_2 < line2->nfields)
{
- beg2 = line2->fields[join_field_2].beg;
- len2 = line2->fields[join_field_2].len;
+ beg2 = line2->fields[jf_2].beg;
+ len2 = line2->fields[jf_2].len;
}
else
{
@@ -376,7 +377,8 @@ check_order (const struct line *prev,
{
if (!issued_disorder_warning[whatfile-1])
{
- if (keycmp (prev, current) > 0)
+ size_t join_field = whatfile == 1 ? join_field_1 : join_field_2;
+ if (keycmp (prev, current, join_field, join_field) > 0)
{
error ((check_input_order == CHECK_ORDER_ENABLED
? EXIT_FAILURE : 0),
@@ -404,6 +406,7 @@ get_line (FILE *fp, struct line *line, int which)
error (EXIT_FAILURE, errno, _("read error"));
free (line->buf.buffer);
line->buf.buffer = NULL;
+ // prevline[which - 1] = NULL;
return false;
}
@@ -605,7 +608,8 @@ join (FILE *fp1, FILE *fp2)
while (seq1.count && seq2.count)
{
size_t i;
- diff = keycmp (&seq1.lines[0], &seq2.lines[0]);
+ diff = keycmp (&seq1.lines[0], &seq2.lines[0],
+ join_field_1, join_field_2);
if (diff < 0)
{
if (print_unpairables_1)
@@ -633,7 +637,8 @@ join (FILE *fp1, FILE *fp2)
++seq1.count;
break;
}
- while (!keycmp (&seq1.lines[seq1.count - 1], &seq2.lines[0]));
+ while (!keycmp (&seq1.lines[seq1.count - 1], &seq2.lines[0],
+ join_field_1, join_field_2));
/* Keep reading lines from file2 as long as they continue to
match the current line from file1. */
@@ -645,7 +650,8 @@ join (FILE *fp1, FILE *fp2)
++seq2.count;
break;
}
- while (!keycmp (&seq1.lines[0], &seq2.lines[seq2.count - 1]));
+ while (!keycmp (&seq1.lines[0], &seq2.lines[seq2.count - 1],
+ join_field_1, join_field_2));
if (print_pairables)
{
--
1.5.5.rc0.21.g740fd