[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
AIX shutdown bug
AIX shutdown bug
Fri, 17 Oct 2003 22:41:24 GMT
Ok, now you need to give the following stuff to IBM:
1. cvs 1.11.9 unmodified
2. The patches I provided to server.c and buffer.c (below)
3. Instructions on how to apply the patches, recompile CVS,
run CVS with server and client on same machine and to do
the "cvs login" and what output to expect.
4. Tell IBM that there appears to be a problem with the fstat()
library call returning -1 in a situation it shouldn't be. Full
description is provided in the comments of the patch provided.
RCS file: /cvs/ccvs/src/buffer.c,v
retrieving revision 220.127.116.11
diff -c -r18.104.22.168 buffer.c
*** buffer.c 17 Feb 2003 21:19:12 -0000 22.214.171.124
--- buffer.c 17 Oct 2003 22:24:25 -0000
*** 1450,1458 ****
--- 1450,1508 ----
# ifndef NO_SOCKET_TO_FD
+ /* FIXME: It appears that there is an AIX
+ bug. In the following sequence:
+ 1. socket to other process on same system opened as fileno 3
+ 2. fdup to create fileno 4
+ 3. other process terminates
+ 4. shutdown done on fileno 3
+ 5. fileno 4 is corrupted in that it can not do fstat anymore.
+ There is no reason for fileno 4 to be corrupted,
+ and on a reference system (Linux) this problem
+ does not occur.
+ To reproduce this problem, the below sleep
+ command should be activated. Then run a client
+ and server on the same system. Then run
+ "cvs login".
+ The actual bug that is causing fileno 4 to be corrupted
+ could be:
+ 1. A bug in the shutdown function.
+ 2. A bug in the AIX operating system in response to
+ a valid shutdown call.
+ 3. A bug in the fstat function, returning -1 just
+ because the OTHER end of the socket is no longer
+ 4. A bug in the AIX operating system in response to
+ a valid system call in fstat().
+ Further resolution to this problem must come from
+ IBM. A workaround to the problem is to add
+ appropriate delays in the other process, such that
+ this process manages to shut down both its filenos
+ Activating the delay in this process will produce
+ consistent failure. Activating the delay in the
+ other process will produce consistent success.
+ Activating neither delay will cause the problem
+ to be intermittent.
+ Activating both delays will cause consistent failure
+ if this delay is longer than the other process's delay.
/* shutdown() sockets */
+ printf("client sleeping 5 seconds\n");
+ printf("client has woken up\n");
shutdown ( fileno (bc->fp), 1);
/* I'm not sure I like this empty block, but the alternative
RCS file: /cvs/ccvs/src/server.c,v
retrieving revision 1.284.2.9
diff -c -r1.284.2.9 server.c
*** server.c 3 Oct 2003 19:15:32 -0000 1.284.2.9
--- server.c 17 Oct 2003 22:24:37 -0000
*** 5719,5724 ****
--- 5719,5731 ----
printf ("I LOVE YOU\n");
+ /* FIXME: On AIX, in some circumstances,
+ there is a race condition in CVS which can be
+ circumvented with a pause here. The race is
+ whether shutdown(,1) in buffer.c is executed
+ before or after the exit(0) call here. */
/* Hook for OS-specific behavior, for example socket subsystems on
- AIX shutdown bug,
Paul Edwards <=