bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#26029: Problems with join


From: Assaf Gordon
Subject: bug#26029: Problems with join
Date: Thu, 9 Mar 2017 17:20:43 +0000
User-agent: Mutt/1.5.23 (2014-03-12)

Hello Reuti and all,

Reuti wrote:
[…] The strange thing seems to be, that "-j1 2" is handled like "-1
2".

My investigations revealed: on a Mac the man page of `join` explains
the behavior. The options -j, -j1 and -j2 are listed with the BSD
version of `join` as being there for compatibility. This leads to the
assumption, that nowadays -1 and -2 should better be used.

Thanks for investigating and pointing this out!

Join's manual section was recently expanded, I wish I was aware of this nuance before I wrote the patch. I will send a patch with improved documentation.

On Thu, Mar 09, 2017 at 05:29:13PM +0100, Reuti wrote:

Reuti wrote:
Am 09.03.2017 um 16:32 schrieb Peter Kluge <address@hidden>:

I prefer the "POSIX"-Standard teaching to my participants.

Aha, I didn't check this. Then the "-j" option should be moved to a new section 
"Deprecated" in the man/info page of the coreutils version too. (And mention the special 
handling of -j1 resp. -j2, while -j3 … works as one expects.)

I would humbly suggest other wording: I'm not sure '-j' is deprecated.
It is useful, and does work as expected in most cases.

But, it should be better documented to warn against this edge-case.

Reuti wrote:
-j FIELD equivalent to '-1 FIELD -2 FIELD'

does not work in all cases essentially.

It 'just works' in most cases, but indeed we should improve the documentation about edge cases.

First,
this is the relevant section that handles the '-j' parameter:
https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/join.c#n1079


Second,
Let's ensure '-jN' works in the common cases,
when it is *not* followed by a number:

Two input files:

   $ cat a.txt
   1 2 3 aaa
   2 3 4 bbb

   $ cat b.txt
   1 2 3 XXX
   2 3 4 YYY

'-j1' alone is equivalent to '-1 1 -2 1':

   $ join -1 1 -2 1 a.txt b.txt
   1 2 3 aaa 2 3 XXX
   2 3 4 bbb 3 4 YYY

   $ join -j1 a.txt b.txt
   1 2 3 aaa 2 3 XXX
   2 3 4 bbb 3 4 YYY

'-j2' alone is equivalent to '-1 2 -2 2':

   $ join -1 2 -2 2 a.txt b.txt
   2 1 3 aaa 1 3 XXX
   3 2 4 bbb 2 4 YYY

   $ join -j2 a.txt b.txt
   2 1 3 aaa 1 3 XXX
   3 2 4 bbb 2 4 YYY

'-j3' alone is equivalent to '-1 3 -2 3':

   $ join -1 3 -2 3 a.txt b.txt
   3 1 2 aaa 1 2 XXX
   4 2 3 bbb 2 3 YYY

   $ join -j3 a.txt b.txt
   3 1 2 aaa 1 2 XXX
   4 2 3 bbb 2 3 YYY

So, in the most common cases, '-jN' works for all Ns
(for "all" being 1,2,3 but really, who needs more than 3 numbers? :) ).
This is perhaps not like BSD's join.


Now comes the tricky part:
If the '-j1' or '-j2' is followed by another parameter,
and that parameter turns out *not* to be an valid field number,
It is treated like '-j 1' (or '-1 1 -2 1'), and join just "does the right thing":

   $ join -j2 -i a.txt b.txt
   2 1 3 aaa 1 3 XXX
   3 2 4 bbb 2 4 YYY

This is implemented here:
https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/join.c#n1171
And the result is that most of the time, join "just works" (IMHO, but
other opinions welcomed).


If the '-j1' or '-j2' is followed by a number, this is were the unexpected behaviour occurs, as it sets the key field for that file alone. E.g. '-j1 2' is equivalent to '-1 2' (and the key for the second
file is not set, thus defaults to 1):

   $ join -j1 2 a.txt b.txt
   2 1 3 aaa 3 4 YYY

   $ join -1 2 a.txt b.txt
   2 1 3 aaa 3 4 YYY


Is the above a satisfactory explanation?
If so, it'll be more-or-less what I'll add to the manual.

I see that this has been implemented back in 2005, here:
https://git.savannah.gnu.org/cgit/coreutils.git/commit/src/join.c?id=f9118c1c2e35b
with the comment:
 "Parse obsolete options -j1 and -j2
  so that it is a pure extension to POSIX 1003.1-2001."

I can perhaps guestimate that since this usage is never
mentioned anywhere, it is considered undocumented and discouraged usage
(and indeed, I don't think I've ever encountered it, or previously
saw a bug-report or question about it - so it's rather rare).

We could add a warning to the man page - what do others think?

regards,
- assaf










reply via email to

[Prev in Thread] Current Thread [Next in Thread]