[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6903: join: improve paralleles to sort?

From: Bernhard Schiffner
Subject: bug#6903: join: improve paralleles to sort?
Date: Wed, 25 Aug 2010 08:57:21 +0200
User-agent: KMail/1.13.5 (Linux/; KDE/4.4.4; i686; ; )

Am Dienstag, 24. August 2010, 23:23:55 schrieb Paul Eggert:
> On 08/24/2010 12:39 PM, Bernhard Schiffner wrote:
> > Because join uses strtoul() before doing comparisation it is
> > understandable. ("unpairable" is the result.)
> No, join doesn't use strtoul. 
I was wrong (It is the number of the field to join.)

> It compares the numbers as strings.
> So if you use plain "sort" on the numbers, join will work, unless the
> numbers are numerically equal but textually different (e.g., 0 versus -0).
Not a problem for me.
> You can then sort the output of join with "sort -n", if you wish.

A small testcase is included here.
join a  b
and try to understand, why the lines with
214618118       /temp/marketing_ms/emails.dat
214618118       /temp/bs/marketing_ms/emails.dat
are not in the result.
Do you see any reason?

Perhaps I'am missusing join here a litte bit, but until now I don't 
understand, why it should be wrong.
Before I'am going to blame someone else, I'll try to dig a little bit deeper 



File a:
21460   /ElsevierDocuments/EWX0886A/09218181/00220001/99000417/main.raw
21460   /ElsevierDocuments/EWX0889A/00319201/01200001/00001461/main.raw
21464   /apache/xerces/dom/DeferredAttrNSImpl.html
21466   /spam/address@hidden
21467   /MINING/MIN0002A/03605442/00230009/98000218/main.raw
21468   /___MRA/___sophos_autoupdate1.dir/1207625107/encloa-b.ide
21468   /___MRA/___sophos_autoupdate1.dir/1208238697/encloa-b.ide
21468   /___MRA/___sophos_autoupdate1.dir/1208834890/encloa-b.ide
21468   /___MRA/___sophos_autoupdate1.dir/1209153877/encloa-b.ide
21468   /___MRA/___sophos_autoupdate1.dir/1209404409/encloa-b.ide
21468   /___MRA/___sophos_autoupdate1.dir/1209710971/encloa-b.ide
21468   /___MRA/___sophos_autoupdate1.dir/1209737271/encloa-b.ide
21468   /___MRA/___sophos_autoupdate1.dir/1214978929/encloa-b.ide
21469   /ElsevierDocuments/EWX0886A/09218181/00370003/02001996/main.raw
21469   /ElsevierDocuments/EWX0890A/00335894/00660002/06000846/main.xml
21469   /ElsevierDocuments/MINING/MIN0001A/01968904/00420007/00000911/main.raw
214602  /ElsevierDocuments/EWX0876A/00370738/01710001/04002477/main.xml
214604  /ElsevierDocuments/EWX0881A/00128252/00700001/04001333/main.xml
214614  /ElsevierDocuments/EWX0887A/02773791/00240020/05000223/main.xml
214666  /ElsevierDocuments/EWX0886A/09218181/00600003/07000240/main.xml
214682  /ElsevierDocuments/EWX0879A/0012821X/02430003/06000367/main.xml
2146369 /marketing/diffferent_Berichtsband_Online_Crossmedia_Kampagnen.pdf
2146427 /LBAtoJM/ROOT/WEB-INF/lib/hibernate-3.2.0.cr3.jar
214618118       /temp/marketing_ms/emails.dat
214618118       /temp/bs/marketing_ms/emails.dat
214618120       /temp/marketing_js/emails.dat

File b:

reply via email to

[Prev in Thread] Current Thread [Next in Thread]