rpctrace / libpager / signal preemptor

Hi -

Just a status update about what I'm working on. My primary goal right now is to get ext2fs working right when a ramdisk fills up. dd hangs and/or crashes ext2fs instead of cleanly erroring out.

First, there's a problem in libpager

--- a/libpager/data-unlock.c
+++ b/libpager/data-unlock.c
@@ -66,16 +66,16 @@ _pager_S_memory_object_data_unlock (struct pager *p,

   if (!err)
     /* We can go ahead and release the lock. */
     _pager_lock_object (p, offset, length, MEMORY_OBJECT_RETURN_NONE, 0,
                        VM_PROT_NONE, 0);
   else
     {
       /* Flush the page, and set a bit so that m_o_data_request knows
         to issue an error. */
       _pager_lock_object (p, offset, length, MEMORY_OBJECT_RETURN_NONE, 1,
-                     VM_PROT_WRITE, 1);
+                   VM_PROT_WRITE, 0);
       _pager_mark_next_request_error (p, offset, length, err);
     }
out:
   return 0;

The final argument to _pager_lock_object is the 'synchronous' flag. The call needs to be asynchronous because libpager is single threaded, at least in the sense that individual memory objects only process one request at a time. In this case, we're processing a data_unlock request, and would have to handle a lock_completed message before lock_object would return (synchronously).

Next, there's a problem with the rpctrace code that I recently modified, specifically the part that synchronizes messages by processing them in order of their 'seqno'. Kernel messages seem to have 'seqno' zero. In particular, exceptions get indefinitely blocked. I'm currently using the following patch (not a complete fix):

--- a/utils/rpctrace.c
+++ b/utils/rpctrace.c
@@ -1232,7 +1232,7 @@ trace_and_forward (mach_msg_header_t *inp, mach_msg_header_t *outp)

   msgid = msgid_info (inp->msgh_id);

- while (inp->msgh_seqno != TRACED_INFO (info)->seqno)
+ while (inp->msgh_seqno > TRACED_INFO (info)->seqno)
     {
       pthread_cond_wait (& TRACED_INFO (info)->sequencer, &tracelock);
     }

Maybe I shouldn't use seqno at all, and sequence messages based on their arrival order.

Once that's been resolved, then we're back to the problem with signal preemptors! libpager/pager-memcpy.c includes the following code:

void fault (int signo, long int sigcode, struct sigcontext *scp)
    {
      assert (scp->sc_error == EKERN_MEMORY_ERROR);
      err = pager_get_error (pager, sigcode - window + offset);
      n -= sigcode - window;
      vm_deallocate (mach_task_self (), window, window_size);
      longjmp (buf, 1);
    }

Since sigcode no longer contains the faulting address (it's in the subcode, remember?) this code calls pager_get_error with a negative second argument and segfaults in the handler, killing ext2fs.

What's supposed to happen (I think) is that the io_write handler in ext2fs attempting to write data into the mapped file triggers a data unlock / data lock / data request / data error sequence, which raises an exception on the memcpy, which gets caught and we get an error return. Or maybe not. Maybe diskfs_grow() should return an error before we attempt the memcpy. I don't understand the ext2fs code well enough to know.

I'm starting to go through the signal preemptor code, trying to figure out a way to handle that problem.

agape

brent

From:	Brent W. Baccala
Subject:	rpctrace / libpager / signal preemptor
Date:	Tue, 8 Nov 2016 20:43:29 -1000