bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

problems with 'join' command


From: Samir Wadhawan
Subject: problems with 'join' command
Date: Thu, 31 Jan 2008 16:41:11 -0500

Dear Mike Haertel,

We call to your attention what we find an unsual behaviour of join command
on ubuntu distributions (Dapper and above). Attached to this mail are the
two sample files (file1 and file2) which produces incorrect output when
joining column 5 of file1 with column 1 of file2. (There follows the
command we used to produce the join: 

join -a1 -15 -21 file1.srt file2.srt).

As indicated in the join's manpage, we ensured that the columns on
which the join was being produced were sorted using these commands before
the join was conducted: 

sort -k 5 file1 > file1.srt  
sort -k 1 file2 > file2.srt  

Surprisingly we notice that join proceeds WITHOUT errors when we use this
variant of sort: 

sort -k 5,5 file1 > file1.srt
sort -k 1,1 file2 > file2.srt

Clearly, the only difference between the above two variants of sort command is
the additional sorting order of the columns following the ones on which the
sort is being generated. This behaviour puzzles us as the join seems to be
producing
different (inconsistent) outputs, and appears to be sensitive to the sorting
order of other columns in the file.

We tried to reproduce this behaviour on an AIX machine, but find that  
both the variants of sorted files produces consistent
join results. 

Please let us know if we are missing something.

Best Regards,
Samir Wadhawan.

************************************************************
Samir Wadhawan

PhD. Candidate 
Dept. of Biochemistry, Microbiology and Molecular Biology

Centre for Comparitive Genomics and Bioinformatics
505 Wartik Lab
The Pennsylvania State University
E-mail: address@hidden; address@hidden
Ph#:(814)865-4754

************************************************************


Attachment: file1
Description: Binary data

Attachment: file2
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]