|Subject:||Re: [certi-dev] Handling crash of federate|
|Date:||Thu, 3 Jul 2014 13:38:24 +0200|
I have been looking how to handle situation where federate program
crashes in Windows environment. Currently rtia.exe can't notice it
since Windows does not inform child processes about parent crash and
TCP socket between parent and child does not cut until long timeout.
This is a problem when trying to make as robust system as possible and
recovering from some crashes (like 3d visual which is merely listening
HLA and therefore would be easy to rejoin to system).
In Linux this is probably not a problem as I think Unix Sockets closes
when parent dies and therefore gets notified correctly.
Unfortunately this is surprisingly difficult to solve in Windows.
Windows offers Job Object -system that could terminate rtia.exe when
parent dies, but as far as I understand it does only offer option for
violent terminate (like kill -9) and it's not good as then rtia.exe's
sockets to rtig.exe would need timeout death.
One way to notice parent crash would be opening pipe between them:
This additional pipe could be watched if select-function would be
changed to WaitForMultipleObjects WINAPI crap, but that forcefully
changes sockets to nonblocking state and it's not too good for current
CERTI logic. By adding check to few recv errors I managed to make this
work, but it was cpu heavy as it caused busy loop.
Other way would be change infinite selects to timed ones internally
and to check that pipe now and then. Or making a thread that would be
checking pipe and then using additional TCP-socket to reset select
when parent dies.
Now I wonder if bit more complex patch would even be taken to base
CERTI code at all. I sure hope so as I find this important issue and I
am ready to solve this.
|[Prev in Thread]||Current Thread||[Next in Thread]|