Sure, of course. Actually, I've just
had another look and I've realised I did screw up the ARM stuff -
I was trying to maintain the existing behaviour since I didn't
have a test system, but it looks like I messed it up. I think it's
just a matter of deleting the first #ifdef TCC_ARM_EABI block from
gfunc_call though. I'll try and get a look at it later but I can't
promise much since I have zero experience of ARM development.
Anyway, onto the stuff I fixed:
Calling convention stuff:
- Various x86 and x86-64 calling conventions pack structure
return values into registers when they are small enough. I
added gfunc_sret which determines whether that is the case and
prevents an extra pointer parameter being added to receive the
return value. It also returns the type used to pass the return
value, so that tccgen.c can save it to the stack. Perhaps in
retrospect it would have been better to move return value
handling into target specific code generators, but anyway, it
- x86-64: rules are rather complicated, see
classify_x86_64_* functions and the SysV ABI. I had to add a
register mode RC_QRET, analogous to RC_IRET, since a pair of
doubles is returned in XMM0:XMM1. This in turn also means I
have added support for XMM1-5 as general registers since
since it was no more work than XMM1 alone. XMM6-7 aren't
caller-saved on Win64 so I didn't make them available for
calculation, although I did add them to the enumeration.
Some cases also return 16-byte structures in RAX:RDX.
- Win32: structures of 8-bytes or less are returned in EAX
- Win64: Structures of 8 bytes or less are returned in RAX.
- Similarly, function arguments may be passed in registers
rather than on the stack.
- The SysV x86-64 ABI has rather complicated rules which
I've implemented in classify_x86_64_* functions.
- Win64 rules are somewhat simpler but (as far as I can
tell, because MSs documentation isn't up to much) basically
decide what to do based on whether the argument is larger or
smaller than 8 bytes. Each argument gets 8 bytes of space in
registers or on the stack; if the argument itself is 8 bytes
or less it is passed in that space, otherwise it is passed
- Win32 rules are the same as Linux-x86 except that small
structures are returned in EAX:EDX.
- x86-64 long double handling: added extra padding so that
long doubles are aligned on 16-byte boundaries. There was
already code to align the stack before the function call, but
this actually has to be done each time a 16-byte aligned
argument is encountered as well.
- x86-64 varargs: I modified __builtin_va_arg_types to use the
classify_x86_64_* functions, and added an alignment parameter
to __va_arg so that 16-byte aligned long doubles can be
- Win64 varargs: I added __builtin_va_start on this platform
since I couldn't see a way around it. If the last parameter
(the second argument to va_start) on Win64 is larger than 8
bytes, it will be passed by reference, and va_start needs to
get the address of the reference, which would require some
sort of &(&x) type _expression_, which is obviously
invalid C. I also redefined va_args.
CMake build system: I added this primarily to make Win64 builds a
lot easier since they then don't need a custom MSYS setup, just
64-bit gcc and mingw32-make which are available together. It
should also work on other platforms where CMake is available. I
had to shift tcclib.h out of the include/ directory to get some of
the tests to work because there isn't a way in CMake to copy
tcclib.h into the test directory, and other headers in include/
interfere with GCC compilation.
Out-of-tree builds: there were a lot of small issues using the
Makefiles for out of tree builds. They should now be self-updating
(modifying the makefiles updates the out-of-tree copy). I've added
$(top_srcdir)/... a lot to get file references right, and updated
include paths where necessary.
Variable length array stuff: VLAs were implemented using
alloca() but the memory wasn't freed until the end of the
function. This prevents VLAs from being used in a loop, for
instance. This is pretty straightfoward to fix when goto and
labels are not in use: just track whether the stack pointer has
been modified and if so reset it at the end of each block. Goto
handling is tricky because in a normal compiler we'd just work
out what the stack pointer should be at the destination and set
it before jumping. TCC can't do that because it generates code
in a single pass, so what I did instead is that a goto with a
VLA in scope saves the stack pointer and then resets it to its
value when the outermost VLA was created. A label with a VLA in
scope then reloads the appropriate stack pointer. Test cases are
This does mean that in certain cases memory allocated by alloca
will be freed when not strictly necessary, i.e. in:
char *p = alloca(n);
At "label" p will have been freed. But otherwise VLAs and alloca
shouldn't interfere since the VLA code doesn't do anything unless
a VLA is in scope.
On 29/04/13 22:12, grischka wrote: