[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Reboots?
From: |
Marcus Brinkmann |
Subject: |
Re: Reboots? |
Date: |
Thu, 29 Mar 2001 02:03:41 +0200 |
User-agent: |
Mutt/1.3.15i |
On Wed, Mar 28, 2001 at 04:45:50PM -0500, Roland McGrath wrote:
> > I wouldn't know how to get it, so I don't know if I can. What do I need for
> > this?
>
> Does ddb work these days? Last time I did kernel hacking it was
> oskit-mach, and that dumps a stack trace when it panics.
I don't know, I never used ddb.
> > If it isn't "wire" I am looking for, I don't know what I am looking for (a
> > grep showed nothing in proc/).
>
> You are right. proc used to wire itself (wire_task_self), but it doesn't
> now (init does). So this kernel bug is of more concern than I thought.
I should mention something. I attached two gdbs, and exited the first one
before the second. (I didn't clear the suspend count when starting the
first, and it didn't ask me for the suspend count when exiting, as it would
in another session I tried). So this might be related to gdb mayhem. I don't
know if running two gdbs is fine (it shouldn't crash the kernel, but...).
Anyway, I sticked with one gdb only this time and it didn't crash. The
subhurd reported that it can't emulate the crash and would reboot the Hurd
now, after exiting gdb. So the kernel panic thread_invoke is either a random
crash or a side effect of the two gdbs (would need to do more testing to
find out. Reproducing the crash takes about one hours, so I'd like to avoid
that).
> > Sometimes I wonder if the kernel ring buffer proposed by RMS wouldn't be
> > helpful in situations like this.
>
> Well, maybe. But it is a lot of overhead. I'd be more inclined to work
> on a way to make it possible to trace a sub-hurd using rpctrace on
> the parent hurd.
Ok, sounds fine, too.
I have reproduced exactly the crash Jeff reported. I have collected the data.
I used a ring buffer of 16 entries (can increase if needed), and the full
gdb log is attached. Here are the three ports on which RPCs where logged
immediately before the crash (in interleaved order, see left column). If a
field is blank, it is the same as the previous one in the same column:
port 218:
real-
order bits size seqno id
1. 2147488018 32 1246 24021 dostop
2. 1247 24031 task2proc
3. 1248 24031
5. 1249 24018 get_arg_locations
7. 1250 24030 task2pid
8. 1251 24012 child
port 229:
order bits size seqno id
4. 2147488018 32 0 24013 setmsgport
6. 4370 40 1 24017 set_arg_locations
9. 24 2 24016 getpids
10. 2147488018 120 3 24022 handle_exceptions
11. 32 4 24021 dostop
12. 5 24031 task2proc
13. 6 24031
15. 4370 24 7 24018 get_arg_locations
port 279:
order bits size seqno id
14. 2147488018 32 0 24013 setmsgport
16. 4370 40 1 24017 set_arg_locations
*** crash ***
Of course, one data point is not very much. I can run this a few more times,
and we can see if a pattern emerges. We can insert assertions etc.
We can probably log whole messages.
Can we run proc single threaded, so that we know where exactly it crashed?
Thanks,
Marcus
--
`Rhubarb is no Egyptian god.' Debian http://www.debian.org brinkmd@debian.org
Marcus Brinkmann GNU http://www.gnu.org marcus@gnu.org
Marcus.Brinkmann@ruhr-uni-bochum.de
http://www.marcus-brinkmann.de
typescript
Description: Text document