[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: assertion fault. on loggin

From: Paul Edwards
Subject: Re: assertion fault. on loggin
Date: Sat, 13 Sep 2003 00:14:50 GMT

"Pierre" <address@hidden> wrote in message news:address@hidden
> >I was under the impression that this problem was intermittent?
> >If it is intermittent, then I need you to keep running it until it
> >coredumps.
> I run it more than twenty times before saying there is no more bug
> I also insert the code in my previous message and try with password in
> CVSROOT or password at the prompt (usually it changes the frequency of
> the bug).
> Enventually, if there is a fstat(4,&s) in the first call I can't
> reproduce the bug.

Ok, well that's the problem with intermittent bugs, you don't know
what it is that has *temporarily* suppressed the bug.  Maybe if you
try it with a different userid etc etc, it will come back to bite you.

Basically, if you use that as the workaround, expect to find a
problem again later, another day.  If you use the return (0), then
that won't actually "come back", it will either work, or it won't.
The extra fstat, although it is actually a better fix in the short
term, it has a good chance of causing you problems in the long

But it is up to you.  I have included a "neat" fix so that if you want
to use the extra fstat, to do a short term fix, then this should do the

If you instead want to delve further, to (hopefully) find the root cause,
then you will need to continue doing this.  It's not that difficult.
What I need you to do is first of all verify the code "works" (ie no
assertion failure), and then assuming that it does, what I need is for
this bit of code to be moved down until you find the lowest spot
where it still avoids the assertion error.

+     if (ugly_hack != NULL)
+     {
+         printf("dummy fno %d prior to %d\n",
+                fileno(ugly_hack->fp),
+                fileno(bc->fp));
+         assert (fstat ( fileno (ugly_hack->fp), &s ) != -1);
+     }

If you don't know what I mean, don't worry about it, just apply
this patch and report the results, I'll give you a new patch for
the next test.

One thing I would like you to do is set up a script that executes
the cvs command 200 times, so that you can be "sure" that the
problem has "definitely gone".  I don't know how practical that
is on your environment.  If it's not practical, don't worry, just
do whatever is normal, 5 or 10 or whatever.  But if it is
practical, I would like to see the CVS command hammered
for as long as you can hammer it for.  What I'm worried about
is that the slight rearrangement of code has simply made the
timing different, and turned a 1 in 2 error into a 1 in 2000
error.  There's NO difference (from a debugging perspective)
between 1 in 2 and 1 in 2000.  There is a HUGE difference
between 1 in 2000 and 0 in infinity.

Sort of, anyway.  (e.g. a line of code that changes 1 in 2 to 1 in
2000 may give a hint as to what is required to convert it into
0 in infinity).

> I was thinking this beaviour was confirming your assumption that it
> was an AIX bug.

At this stage it could be a CVS wild pointer, a bug in your compiler,
a bug in your operating system, or a CVS design error.

My theory at the moment is that it is a CVS design error.

> So, I've ask my administrator to upgrade the OS from maintenance level
> 9 to 12, and ask our AIX support to put an incident at IBM.

It's probably premature at this stage.  My theory is that on the first
close, the other end decides to shut down too, and decides to shut
down both streams.  A timing issue.  When you provide more
results, a "sleep(5)" will eliminate the timing sensitivity.

> But I'm rather pessimist on the result (As the bug cannot be produce
> in a simple way : CVS must be installed, a server running..., and it's
> still in AIX 5.1)


> If you want to do other tests, to be sure of the cause of the bug,
> there is no problem. The more cleaner the bug will be handle the
> better it is.

I think another 10 messages and I'll have confidence to tell you
whether it is CVS problem or a non-CVS (OS/compiler) problem.

BFN.  Paul.

These patches are against cvs1-11-6, make sure you go back to
original cvs1-11-6 code before applying patch!

Index: client.c
RCS file: /cvs/ccvs/src/client.c,v
retrieving revision 1.318.4.6
diff -c -r1.318.4.6 client.c
*** client.c 19 May 2003 02:00:30 -0000 1.318.4.6
--- client.c 12 Sep 2003 23:48:37 -0000
*** 3678,3683 ****
--- 3678,3684 ----

     If we fail to connect or if access is denied, then die with fatal
     error.  */
+ struct stdio_buffer_closure *ugly_hack = NULL;
  connect_to_pserver (root, to_server_p, from_server_p, verify_only, do_gssapi)
      cvsroot_t *root;
*** 3720,3726 ****
--- 3721,3729 ----
   int status;

+         ugly_hack = (struct stdio_buffer_closure *)from_server->closure;
   status = buf_shutdown (to_server);
+         ugly_hack = NULL;
   if (status != 0)
       error (0, status, "shutting down buffer to server");
   buf_free (to_server);

Index: buffer.c
RCS file: /cvs/ccvs/src/buffer.c,v
retrieving revision
diff -c -r1.21.4.1 buffer.c
*** buffer.c 17 Feb 2003 21:19:12 -0000
--- buffer.c 12 Sep 2003 23:51:16 -0000
*** 1371,1376 ****
--- 1371,1377 ----

+ extern struct stdio_buffer_closure *ugly_hack;

  static int
  stdio_buffer_shutdown (buf)
*** 1381,1386 ****
--- 1382,1394 ----
      int closefp = 1;

      /* Must be a pipe or a socket.  What could go wrong? */
+     if (ugly_hack != NULL)
+     {
+         printf("dummy fno %d prior to %d\n",
+                fileno(ugly_hack->fp),
+                fileno(bc->fp));
+         assert (fstat ( fileno (ugly_hack->fp), &s ) != -1);
+     }
      assert (fstat ( fileno (bc->fp), &s ) != -1);

      /* Flush the buffer if we can */

reply via email to

[Prev in Thread] Current Thread [Next in Thread]