[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#6903: join: improve paralleles to sort?
From: |
Bernhard Schiffner |
Subject: |
bug#6903: join: improve paralleles to sort? |
Date: |
Wed, 25 Aug 2010 08:57:21 +0200 |
User-agent: |
KMail/1.13.5 (Linux/2.6.33.4-0.1-desktop; KDE/4.4.4; i686; ; ) |
Am Dienstag, 24. August 2010, 23:23:55 schrieb Paul Eggert:
> On 08/24/2010 12:39 PM, Bernhard Schiffner wrote:
> > Because join uses strtoul() before doing comparisation it is
> > understandable. ("unpairable" is the result.)
>
> No, join doesn't use strtoul.
I was wrong (It is the number of the field to join.)
> It compares the numbers as strings.
> So if you use plain "sort" on the numbers, join will work, unless the
> numbers are numerically equal but textually different (e.g., 0 versus -0).
Not a problem for me.
> You can then sort the output of join with "sort -n", if you wish.
A small testcase is included here.
Do
join a b
and try to understand, why the lines with
214618118 /temp/marketing_ms/emails.dat
214618118 /temp/bs/marketing_ms/emails.dat
are not in the result.
Do you see any reason?
Perhaps I'am missusing join here a litte bit, but until now I don't
understand, why it should be wrong.
Before I'am going to blame someone else, I'll try to dig a little bit deeper
too.
TIA!
Bernhard
File a:
21460 /ElsevierDocuments/EWX0886A/09218181/00220001/99000417/main.raw
21460 /ElsevierDocuments/EWX0889A/00319201/01200001/00001461/main.raw
21464 /apache/xerces/dom/DeferredAttrNSImpl.html
21466 /spam/address@hidden
21467 /MINING/MIN0002A/03605442/00230009/98000218/main.raw
21468 /___MRA/___sophos_autoupdate1.dir/1207625107/encloa-b.ide
21468 /___MRA/___sophos_autoupdate1.dir/1208238697/encloa-b.ide
21468 /___MRA/___sophos_autoupdate1.dir/1208834890/encloa-b.ide
21468 /___MRA/___sophos_autoupdate1.dir/1209153877/encloa-b.ide
21468 /___MRA/___sophos_autoupdate1.dir/1209404409/encloa-b.ide
21468 /___MRA/___sophos_autoupdate1.dir/1209710971/encloa-b.ide
21468 /___MRA/___sophos_autoupdate1.dir/1209737271/encloa-b.ide
21468 /___MRA/___sophos_autoupdate1.dir/1214978929/encloa-b.ide
21469 /ElsevierDocuments/EWX0886A/09218181/00370003/02001996/main.raw
21469 /ElsevierDocuments/EWX0890A/00335894/00660002/06000846/main.xml
21469 /ElsevierDocuments/MINING/MIN0001A/01968904/00420007/00000911/main.raw
214602 /ElsevierDocuments/EWX0876A/00370738/01710001/04002477/main.xml
214604 /ElsevierDocuments/EWX0881A/00128252/00700001/04001333/main.xml
214614 /ElsevierDocuments/EWX0887A/02773791/00240020/05000223/main.xml
214666 /ElsevierDocuments/EWX0886A/09218181/00600003/07000240/main.xml
214682 /ElsevierDocuments/EWX0879A/0012821X/02430003/06000367/main.xml
2146369 /marketing/diffferent_Berichtsband_Online_Crossmedia_Kampagnen.pdf
2146427 /LBAtoJM/ROOT/WEB-INF/lib/hibernate-3.2.0.cr3.jar
214618118 /temp/marketing_ms/emails.dat
214618118 /temp/bs/marketing_ms/emails.dat
214618120 /temp/marketing_js/emails.dat
File b:
21460
21468
21469
214618118
215777777