On Thu, Aug 2, 2012 at 8:36 AM, Michael Goffioul
Actually, that makes sense. In order to use the sse instruction, we
really want the stack to 16 byte aligned I think. Can you try changing
the stack alignment to 16 bytes instead of 4?
No luck. I've modified your patch to read:
opts.StackAlignmentOverride = 16
For your information, I've attached the generated assembly for the 4-bytes and 16-bytes case. The code still crashes, but at an earlier location. Now it crashes at the MOVAPD call (address 02D300BC). If you compare with the 4-bytes case, the latter uses MOVUPD instead, so it doesn't crash. Also if you compare the 2 files, you see that in the 16-bytes case, all stack offsets are multiple of 16 bytes, but I don't see any code to realign the stack on a 16-bytes boundary.
The bottom line is: within the generated code, the stack is kept aligned on 16-bytes, but as there's no forced realignment, it entirely depends on the stack alignment on function entry.
Michael.