l4-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IDL issue - struct return vs. cap return


From: Jonathan S. Shapiro
Subject: Re: IDL issue - struct return vs. cap return
Date: Tue, 10 Jul 2007 10:51:02 -0400

As promised, I am replying to Neal's example in greater detail.

To enhance clarity, I have renamed the "foo" instance in main() to
"mfoo". This will make it easier to describe what is happening, but it
doesn't change the program behavior at all.

On Mon, 2007-07-09 at 23:24 +0200, Neal H. Walfield wrote:
> Consider:
> 
>   #include <string.h>
>   #include <stdio.h>
>   
>   struct foo
>   {
>     char *p;
>     char s[100000];
>   };
>   
>   struct foo bar (void)
>   {
>     struct foo foo;
>     foo.p = foo.s;
>     return foo;
>   }
>   
>   int
>   main ()
>   {
>     struct foo mfoo;
>     mfoo.p = mfoo.s;
>   
>     printf ("before: %x\n", mfoo.p);
>     mfoo = bar ();
>     printf ("after: %x\n", mfoo.p);
>   }
> 
>   $ ./foo
>   before: 7339c5c8
>   after: 7336b848

Let's walk through it. Prior to the first call to printf(), it is clear
that mfoo.p and mfoo.s are the same location.

Following the return from bar(), mfoo.p contains the location of foo.s.
That is, it points to a garbage location inside the bar() stack frame,
which has exited. This is true because foo is copied **by value** on
return from bar().

In classical compilers, there were actually TWO copies on struct return:

  The first occurred on return from bar(), where foo was copied
  by value to a temporary slot on the stack.

  The second then copied the bits from the temporary slot into mfoo.

Now, the compiler optimization we were considering is that the compiler
might pass a pointer to the final result location explicitly. This
allows it to avoid using a redundant slot on the stack. Let me do this
by hand so that you can see how it works:

  void barOpt(struct foo *_retVal)
  {
    struct foo foo;
    foo.p = foo.s
    memcpy(_retVal, &foo, sizeof (*_retval));
    // Note that memcpy() is implementing "copy by value" here.
  }

and the call site in main gets rewritten from:

  mfoo = bar();

to

  barOpt(&mfoo);

On return, mfoo.p will still contain the location of foo.s. There is
absolutely no change in semantics or results here. The nature of the
optimization is merely the elimination of a *redundant* temporary
location on the stack.

I believe that Neal inferred more than I intended to say. He may have
imagined that bar() would be rewritten *incorrectly* as follows:

  void barOptWrong(struct foo *_retVal)
  {
    // struct foo wrongly fused with _retVal and therefore
    // dropped
    _retVal->p = _retval->s;  // wrong
    // memcpy not needed because of dropped struct.
  }

This is, of course, wrong because mfoo.p ends up with the wrong value.
There are cases where this fusion is legal and cases where it is not.
The fusion is *not* legal if:

  1. An address is taken of the local copy (in this case 'foo').
  2. Control flow may not return following a partial or complete
     mutation of *_retVal.

Neal's example violates rule (1), so the local cannot be eliminated --
or at least, it cannot be eliminated without considering more aggressive
optimization.

Hopefully this explains Neal's confusion about return value rewrites.


Now let me twist your brains a bit.

In this particular case, the local actually *can* be eliminated. Because
foo.p is not sourced within bar(), the compiler is free to notice that
the sequence:

   foo.p = &foo.s;      // capture address of s
   _retval->p = foo.p; // during return

can be rewritten to:

  foo.p = &foo.s;
  _retVal->p = &foo.s;

This reduction is legal because foo.p is not mutated within bar() after
the initial assignment. foo.p can be replaced by &foo.s under the rules
of common subexpression elimination.

After doing this substitution, is then free to notice that foo.p is
unused and elimate that assignment, leaving:

  _retval->p = &foo.s;

But this is really something of the form %ESP+<offsetOf(s)>. No value
within foo is actually being referenced at all. At this point a really
good optimizer will apply commutativity of addition and subtraction on
ESP, drop the local structure foo entirely, and rewrite the procedure
as:

  _retVal->p = %ESP - <sizeof(foo)+offsetOf(s)>

At this point a good compiler will notice that there are no local
variables and drop the frame pointer as well.

In the end, what should be left is just the following:

    _retval->p <- %esp - const;
    ret


shap
    





reply via email to

[Prev in Thread] Current Thread [Next in Thread]