|
From: | Elias Mårtenson |
Subject: | Re: [Bug-apl] Remaining APserver issues |
Date: | Thu, 31 Jul 2014 18:25:24 +0800 |
I did another test and added a two-second sleep after attempting to connect to the APserver, and that removed the problem. Thus, I conclude that the issue is that the APserver doesn't have time to initialise before the parent tries to connect.I'd like to propose that APserver sends a message to the parent instead of having an arbitrary sleep (right now it's 20 ms, I believe). There are a few different ways of doing this. Here are a few:
- The parent redirects stdout and waits for a message from APserver
- The same as above, but the apl session opens a named pipe, passed in the name of the pipe to APserver and the message is sent over that channel instead.
- APserver detaches and forks itself into the background once all initialisation has been performed. The parent apl session waits for the "parent" APserver to exit before attempting to connect.
- The apl session attempts multiple retries over a few seconds before giving up.
I'm sure there are other ways to handle it as well. At least we know what the problem is now. :-)Regards,Elias
On 31 July 2014 10:43, Elias Mårtenson <address@hidden> wrote:I've checked, and here are the results. I noticed that sometimes the APserver gets killed when I )OFF the interpreter, and sometimes it doesn't.$ dist/bin/apl --silent -l 37sizeof(Svar_record) is 328sizeof(Svar_partner) is 28initializing paths from argv[0] = dist/bin/aplinitializing paths from $PWD = /home/emartenson/src/aplAPL_bin_path is: ./dist/binAPL_bin_name is: aplReading config file /home/emartenson/src/apl/dist/etc/gnu-apl.d/preferences ...config file /home/emartenson/.config/gnu-apl/preferences is not present/readable0 input files:Using TCP socket towards APserver...connecting to 127.0.0.1 TCP port 16366(this is expected to fail, unless APserver was started manually)forking new APserver listening on 127.0.0.1 TCP port 16366connecting to 127.0.0.1 TCP port 16366(this is supposed to succeed.)::connect() to existing APserver failed: Connection refusedPID is 22704argc: 4argv[0]: 'dist/bin/apl'argv[1]: '--silent'argv[2]: '-l'argv[3]: '37'uprefs.user_do_svars: 1uprefs.system_do_svars: 1uprefs.requested_id: 0uprefs.requested_par: 0Svar_DB not connected in Svar_DB::is_registered_id()id.proc: 1001 at ProcessorID.cc:77Processor ID was completely initialized: 1001:0:0system_do_svars is: 1Then, I check for listeners from another terminal:$ netstat -an | grep 16366tcp 0 0 127.0.0.1:16366 0.0.0.0:* LISTEN$ ps -ef | grep APemarten+ 22712 1 0 10:34 pts/3 00:00:00 ./dist/bin/APserver --port 16366emarten+ 22733 28324 0 10:36 pts/1 00:00:00 grep API then quit the APL session:)offAnd then check connections again:$ ps -ef | grep APemarten+ 22712 1 0 10:34 pts/3 00:00:00 ./dist/bin/APserver --port 16366emarten+ 22750 28324 0 10:38 pts/1 00:00:00 grep APem-desktop$ netstat -an | grep 16366tcp 0 0 127.0.0.1:16366 0.0.0.0:* LISTENAs we can see, the APserver is still listening.I now try to start the APL interpreter again, and it properly connects to the old APserver:$ dist/bin/apl --silent -l 37sizeof(Svar_record) is 328sizeof(Svar_partner) is 28initializing paths from argv[0] = dist/bin/aplinitializing paths from $PWD = /home/emartenson/src/aplAPL_bin_path is: ./dist/binAPL_bin_name is: aplReading config file /home/emartenson/src/apl/dist/etc/gnu-apl.d/preferences ...config file /home/emartenson/.config/gnu-apl/preferences is not present/readable0 input files:Using TCP socket towards APserver...connected to APserver, socket is 3using Svar_DB on APserver!PID is 22768argc: 4argv[0]: 'dist/bin/apl'argv[1]: '--silent'argv[2]: '-l'argv[3]: '37'uprefs.user_do_svars: 1uprefs.system_do_svars: 1uprefs.requested_id: 0uprefs.requested_par: 0id.proc: 1001 at ProcessorID.cc:77Processor ID was completely initialized: 1001:0:0system_do_svars is: 1We can see that it's actually connected by checking the APserver status again:$ netstat -an | grep 16366tcp 0 0 127.0.0.1:16366 0.0.0.0:* LISTENtcp 0 0 127.0.0.1:44102 127.0.0.1:16366 ESTABLISHEDtcp 0 0 127.0.0.1:16366 127.0.0.1:44102 ESTABLISHEDem-desktop$ ps -ef | grep APemarten+ 22712 1 0 10:34 pts/3 00:00:00 ./dist/bin/APserver --port 16366emarten+ 22782 28324 0 10:40 pts/1 00:00:00 grep APNow, let's )OFF the interpreter which promptly kills the APserver that was originally started in the first invocation of apl:$ netstat -an | grep 16366tcp 0 0 127.0.0.1:44102 127.0.0.1:16366 TIME_WAITem-desktop$ ps -ef | grep APemarten+ 22790 28324 0 10:41 pts/1 00:00:00 grep APRegards,EliasOn 29 July 2014 21:17, Elias Mårtenson <address@hidden> wrote:
I will definitely check this when I get back to the office tomorrow. I'll keep you posted.
Thanks and regards,EliasOn 29 July 2014 21:13, Juergen Sauermann <address@hidden> wrote:Hi,
that makes me think that APserver is listening on a different socket type than the one apl is using.
Therefore, netstat -l -p to see where APserver listens and apl -l 37 to see where apl tries to connect.
/// Jürgen
On 07/29/2014 03:07 PM, Elias Mårtenson wrote:
I don't think so. The APserver is definitely started. Also, if I start another apl it's able to connect to the previous one.
My theory is the same as before, I think that apl attempts to connect to APserver before it's ready to accept connections.
Also, given the fact that apl never connects to APserver, it's not very strange that it's not killed when apl exits.
In the case where I start a second apl that connects to the first APserver, it does get killed properly.
Regards,
EliasOn 29 Jul 2014 21:02, "Juergen Sauermann" <address@hidden> wrote:
Hi Elias,
looks like either no APserver is running or the APserver listens on another socket.
Check with netstat -l -p. That should show a line like:
tcp 0 0 localhost:16366 *:* LISTEN 2631/APserver
If the APserver does not get killed then this is the problem I had earlier but could not reproduce.
If you can reproduce it, please uncomment the #define USE_POLL at the beginning of APserver.cc
and reinstall. That will tell us if poll() works better than select(). If not, we could try tcp_keepalive to
see if that works better.
/// Jürgen
On 07/29/2014 05:27 AM, Elias Mårtenson wrote:
The following happens on my Arch Linux system.When I start the apl binary (without Emacs) I'm getting a "connection refused" error. The log with -l 37 is reproduced below.
The APserver is properly started (I can see it in the process listing), but after I call )OFF, it doesn't get killed.
Note that if I start APserver separately, I do not get any errors, and everything seems to work correctly.
Here's the output from -l 37 (errors highlighted in red):
$ dist/bin/apl -l 37 --silentsizeof(Svar_record) is 328sizeof(Svar_partner) is 28
initializing paths from argv[0] = dist/bin/aplinitializing paths from $PWD = /home/emartenson/src/aplAPL_bin_path is: ./dist/binAPL_bin_name is: aplReading config file /home/emartenson/src/apl/dist/etc/gnu-apl.d/preferences ...config file /home/emartenson/.config/gnu-apl/preferences is not present/readable0 input files:Using TCP socket towards APserver...connecting to 127.0.0.1 TCP port 16366(this is expected to fail, unless APserver was started manually)forking new APserver listening on 127.0.0.1 TCP port 16366connecting to 127.0.0.1 TCP port 16366(this is supposed to succeed.)::connect() to existing APserver failed: Connection refusedPID is 24054argc: 4argv[0]: 'dist/bin/apl'argv[1]: '-l'argv[2]: '37'argv[3]: '--silent'uprefs.user_do_svars: 1uprefs.system_do_svars: 1uprefs.requested_id: 0uprefs.requested_par: 0Svar_DB not connected in Svar_DB::is_registered_id()id.proc: 1001 at ProcessorID.cc:77Processor ID was completely initialized: 1001:0:0system_do_svars is: 1⎕SVQ⍳0Svar_DB not connected in Svar_DB::get_offering_processors()100 210
Regards.Elias
[Prev in Thread] | Current Thread | [Next in Thread] |