bug-gnubatch
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnubatch] Host interconnection doesn't work anymore


From: John Collins (Xi Software Ltd)
Subject: Re: [bug-gnubatch] Host interconnection doesn't work anymore
Date: Wed, 18 Jul 2012 16:03:55 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1

On 10/07/12 18:01, John Collins (Xi Software Ltd) wrote:
On 10/07/12 16:22, Ralf Kraudelt wrote:
Hi,
I play around with GNUbatch. Version 1.4 is working like a charm - mostly. 2 interconnected hosts, exported variables and jobs work fine. It's a plain installation, only --prefix is changed and a gnubatch.hosts file is configured.

Now I tried version 1.5. Same situation, plain installation. After the first start, everything worked fine: 2 connected hosts, exported jobs and variables. But after a restart of one and later both servers, they can't see each other. I run gbch-conn, but it returns without error. gbch-rr doesn't exit. gbch-xq doesn't see exported jobs and variables from the other machine and so on.

Due to the changes in version 1.5 "Most of the network handling has been rewritten to not require very much in the hosts file for connections between servers." I guess that my problem arises from these changes. Is there anything I can do to help?

Documentation problem:
gbch-q -? says: "-n display network wide jobs". The correct option is -r.

Regards
Ralf


Thanks I'll try and fix that.

Can the problem with no interconnection be that the system thinks the TCP connection isn't closed?

I have had a few problems with that in the past. Even SO_REUSEADDR doesn't let it clear properly sometimes.

Could I just check that you've got the (sysconfdir)/gnubatch.hosts set up OK?

If necessary, you might want to put "localaddress <my IP address>" in there.

On some systems gethostbyaddr applied to the result of gethostname gives 127.0.1.1 or something rather than the "external" IP address in which case you can set it there. That will cause the network code to get confused about who jobs belong to which I suspect is what's happening.

I think I'm going to have to forget gethostname and kick off by connecting to something and fish the IP address out of getsockname(). But I'll still have to have the address overridden by localaddress as some machines have more than one IP. But that won't fix DHCP machines, perhaps I'll have an alternate syntax for a hostname/IP and port number to connect to and getsockname-ify from there.

And I'm going to have to worry about IPv6 too. Please submit your bright ideas here.

--
John Collins address@hidden Xi Software Ltd www.xisl.com

Phone: +44 (0)1707 886110 Home Phone: +44 (0)1707 883174
Mobile: +44 (0)7958 387247 (address@hidden)

Trading Address 3 Mandeville Rise, Welwyn Garden City, Herts, AL8 7JT, UK

Registered in England Company Number 01977148 VAT GB 403 9239 64 R/O: 2 Mill Road, Haverhill, Suffolk, CB9 8BD

reply via email to

[Prev in Thread] Current Thread [Next in Thread]