bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#8040: join 5.97 bug


From: Batson, Brannon
Subject: bug#8040: join 5.97 bug
Date: Mon, 14 Feb 2011 19:22:13 -0500

Sorry, when I said 'join -v 1 d c', I meant 'join -v 1 b a'. The only files 
involved are a & b which I sent contents for.

The fundamental problem is that join and sort disagree on what 'sorted' means 
in this case.

Brannon

________________________________________
From: Eric Blake address@hidden
Sent: Monday, February 14, 2011 7:18 PM
To: Batson, Brannon
Cc: address@hidden
Subject: Re: bug#8040: join 5.97 bug

On 02/14/2011 04:45 PM, Batson, Brannon wrote:
>
> File a:
> 10 A
> 1  B
>
> File b:
> 1
>
> $ join b a
> <nada>
>
> $ join -v 1 d c
> 1

You didn't provide a file d or c to compare against.

>
> files a & b are both sorted lexicographically  (according to 'sort', anyway). 
> The problem is that the join lexicographic '<' operator disagrees with sort's.

Thanks for the report.  I can't help but wonder if you've stumbled into
this:

http://www.gnu.org/software/coreutils/faq/#join-requires-sorted-input-files

At any rate, the only bug here is in your input files.

>
> Sorry if this bug has been found like a thousand times before, couldn't find 
> it via 30s of googling.

Coreutils 5.97 is OLD.  The latest stable release is 8.10, and it has
improved diagnostics for helping you discover sorting problems with join:

join --help reminds you that:

Important: FILE1 and FILE2 must be sorted on the join fields.
E.g., use ` sort -k 1b,1 ' if `join' has no options,
or use ` join -t '' ' if `sort' has no options.

And trying your example with LC_ALL=en_US.UTF-8 gives:

$ join b a
join: file 2 is not in sorted order

Sure enough, using sort --debug to find the culprit (a was not sorted
according to -k 1b,1):

$ sort --debug a
sort: using `en_US.UTF-8' sorting rules
10 A
____
1  B
____
$ sort --debug -k 1b,1 a
sort: using `en_US.UTF-8' sorting rules
1  B
_
____
10 A
__
____


--
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org






reply via email to

[Prev in Thread] Current Thread [Next in Thread]