l4-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug in task server startup code


From: Bas Wijnen
Subject: Re: bug in task server startup code
Date: Thu, 07 Oct 2004 21:49:46 +0200
User-agent: Mozilla Thunderbird 0.8 (X11/20040926)

Now I read this again I remember why there was a need for the error to be handled. In my previous mail I had forgotten about the map items, and thought it could as well be ignored.

Marcus Brinkmann wrote:
At Tue, 17 Aug 2004 13:11:30 +0200,
Bas Wijnen <address@hidden> wrote:

I did some more testing, added output code to wortel/startup.c (which is the
startup code of all tasks started by wortel except physmem, so only the task
server at the moment) and tried to see if the mappings it request from physmem
(its startup and memory container) arrive with their correct data (I added
some print statements to physmem as well.)

Well, they don't.  startup.c pagefaulted on the check, so I checked the result
of the ipc which received the mapping.  It failed with error code 9, meaning
"message overflow in the receive phase.  A message overflow can occur [string
related stuff] and if a map/grant of an fpage fails because the system has not
enough page-table space available." (L4 Reference Manual X.2 page 62)


Well, first of all, you should check if the mappings are at least
correct (or reasonable).  That's an important sanity check because
there might just be a bug in determining the fpages and their load
addresses.

However, an error 9 is interesting indeed.

It was some time ago, but I remember all kinds of weird stuff happening, like different behaviour when only some debugging was added in parts which were not called. So I'm not really sure about anything it was doing, and perhaps the error was changed between happening and reporting, or something.

The memory is mapped in fpages, maybe mapping one fpage, the one with
the addresses you accessed, worked, and another one failed?  Still,
you would not expect to see wrong data, and in your other mail you say
the offset was actually 0x1000 off or so, which would indicate to me a
bug in the ELF loader/startup mapping stuff.

I thought of that, too. The code is at its correct position in wortel, but there seems to be something wrong with the mapping. However, I have seen it work as expected, too, which for me points to a buffer overflow (which brought me to valgrind ;-) )

The best way to track such things down is to track them down, line by
line, instruction by instruction.  It's slow work, but you can learn
something about the kernel debugger along the way :)


Failing ipcs may corrupt the database of a capability server.  In this case,
physmem thinks task has received the pages, because it is not notified of the
failed ipc.


This needs some consideration.  I think you have found one of the few
cases (maybe even the only one), where sending an IPC can actually
fail without either the sender or receiver being at fault (in a broad
sense).  OTOH, if there is no room for page tables anymore, you are in
deep shit.  Might as well panic and reboot at that point.

Mapping memory is to be considered a restricted operation.  We can
enforce that by using redirectors.

Eh, I don't see the problem. What's wrong with threads mapping their memory to other threads? If they want others to be able to access them, they can just do it for them. When the pages are unmapped, they are unmapped recursively, so the thread cannot keep a mapping by giving it away. Also, if physmem keeps quota, mapping memory to other threads (or even granting it) by yourself does not give you the right to an extra page, so that is also not a problem.

For all other IPC failures I can think of, the story is actually quite
simple: It's either a programming bug in the server (fix the bug then)
or the fault of the client.

Agreed.

If it is not clear to you why it is always the client's fault, I can
explain further.  Let me know which case you are interested in (simple
IPC, string items, map items).

One thing about string items. I recall you saying (in an e-mail or comments in the code, I don't remember) that they may be supported at some point. That surprised me, because the reference manual specifically states that page faults during a string ipc can lock both sender and receiver (with a malicious pager on either side.) I expected that to be enough reason not to use them in the hurd, because there is nu mutual trust (other servers can use them of course, but not hurd servers.)

Thanks,
Bas

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]