bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: kernel panic in gsync_wait


From: Brent W. Baccala
Subject: Re: kernel panic in gsync_wait
Date: Thu, 3 Nov 2016 14:12:41 -1000

Hi -

I've been trying to figure what to do with the gsync code.  It causes undefined behavior and occasional kernel panics when rpctrace is used on gsync_wait and gsync_wake calls.  I suspect that this ultimately affects just about every program on a Hurd system.  Even if a program doesn't call locking primitives directly, they're used by the C library, right?

On Sun, Oct 30, 2016 at 10:16 PM, Kalle Olavi Niemitalo <kon@iki.fi> wrote:
"Brent W. Baccala" <cosine@freesoft.org> writes:

> Even if I'm right about the nature of this bug, I don't understand gnumach
> well enough to know how a task should access another task's memory.

vm_copy apparently supports such access; code from there could be
reused.  But if rpctrace uses gsync_wait on the address space of
another task and the page has been paged out, then the call could
end up blocking for the pager, and I don't think you want that.

I studied vm_copy and came up with some code, but gsync_wake needs to actually modify the memory location, so setting up a copy doesn't work.  Plus, vm_copy makes a mapping that is visible to user space and leaves the user space code responsible for deallocating it.

As for blocking for the pager, I think the code does that already.  As innocuous as this line of code looks:

     *(unsigned int *)addr = val;

it can trigger a page fault and block the RPC, right?

Here's what I think gsync_wait and gsync_wake need to do if their task argument isn't the current task:

1. lookup the address in the other task's pagemap, flag the map entry 'shared', and insert a copy of it into the kernel pagemap,

2. access the memory location, triggering a page fault which actually creates the physical mapping, since Mach doesn't create physical maps until required, then

3a. either remove the pagemap entry from the kernel map when we're done, or
3b. leave it for future use.

If we choose option (3a), then every wrapped call to gsync_{wait,wake} will trigger a page fault.  If we choose option (3b), then step 1 needs to be modified to search the kernel pagemap to see if it already has a copy of the map entry in question.  I have to think some more to figure how that mapping ultimately gets removed, and of course there's a memory exhaustion issue if we add lots of new entries to the kernel map on a 32-bit system.

Another possibility would be to reject the RPC back to user space, and put the old locking code back into glibc as a fallback.

All this brings us back to...

On Sun, Oct 30, 2016 at 10:16 PM, Kalle Olavi Niemitalo <kon@iki.fi> wrote:

Could gsync_wait be removed from gnumach.defs and replaced with
only a trap that does not take a task_t parameter and cannot be
intercepted by rpctrace?

I don't think we have anything else quite like gsync - kernel code that directly accesses user space memory in a different task from the one that trapped.  Our choices seem to be:

1. add a lot of new complexity to the gsync kernel routines
2. add fallback code to glibc
3. add a new system call as Kalle proposed

Comments?

    agape
    brent



reply via email to

[Prev in Thread] Current Thread [Next in Thread]