tinycc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] [PATCH] arm: Handle __builtin_frame_address(1) correc


From: Kirill Smelkov
Subject: Re: [Tinycc-devel] [PATCH] arm: Handle __builtin_frame_address(1) correctly
Date: Sat, 5 Jan 2013 19:35:44 +0400
User-agent: Mutt/1.5.21 (2010-09-15)

On Sat, Jan 05, 2013 at 03:36:49PM +0100, Thomas Preud'homme wrote:
> Le jeudi 13 décembre 2012 17:55:41, vous avez écrit :
> 
> [SNIP]
> 
> > 
> > from arch/arm/kernel/stacktrace.c:
> > 
> >     * With framepointer enabled, a simple function prologue looks like
> > this: *     mov     ip, sp
> >     *       stmdb   sp!, {fp, ip, lr, pc}
> >     *       sub     fp, ip, #4
> >     *
> >     * A simple function epilogue looks like this:
> >     *       ldm     sp, {fp, sp, pc}
> > 
> > so since fp is initialized from ip-4 (old_sp-4) it points to (near ?)
> > fp, not pc, right?
> 
> stmdb means sp is decremented first (sp = sp - 4 * nbregs) and then register 
> are saved in ascending order from this new location. So, pc will end up at 
> initial_sp - 4 :) Hence ip points at pc.
> 
> > 
> > also linux kernel is compiled with
> > 
> >     -mapcs -mno-sched-prolog -mabi=apcs-gnu
> > 
> > and this changes frame layout compared to what gcc produces by default.
> > 
> > (sorry for that near - i don't remember stack and stmdb arm details
> >  good enough)
> 
> stmdb stands for STore Multiple Decrement Before
> 
> > 
> > > >        8:   e92d0870        push    {r4, r5, r6, fp}        ; NOTE
> > > >        r4... go before fp
> > > >       
> > > >       14:   e28db00c        add     fp, sp, #12
> > > >       
> > > >       5c:   e59b0000        ldr     r0, [fp]                ; BUG here!
> > > 
> > > Why is this a bug in GCC? It will load the old value of fp to r0.
> > 
> > Hmm, after push here is how stack looks like:
> > 
> >         r4  <- old_sp
> >         r5  <- sp + 12
> >         r6
> >         fp
> >             <- sp
> > 
> > (again, off-by-one error probably, but anywayy that sp+12 can't point to
> > old fp).
> 
> Nope. push store register in ascending order from the lowest address to the 
> highest address. push {regs} is actually the same as stmdb sp! {regs}. So, 
> after that initial push, you'd have:
> 
> x  <- old_sp
> fp <- sp + 12
> r6
> r5
> r4 <- new_sp
> 
> so in fp is stored sp+12 that is the address where is stored the old fp :)
> 
> Hence the ldr does load the address of the previous frame in the return 
> register (r0).
> 
> > 
> > ~~~~
> > 
> > Anyway, I've prepared more simple complete example to demonstrate the
> > bug:
> > 
> >     ---- 8< ---- fp.c
> >     #include <stdio.h>
> > 
> >     void *func(int x)
> >     {
> >         char *fp1 = __builtin_frame_address(1);
> > 
> >         printf("Hello World! %i\n", x);
> > 
> >         fp1 += x;
> >         return fp1;
> >     }
> > 
> >     int main()
> >     {
> >         void *fp0 = __builtin_frame_address(0);
> >         void *fp1 = func(0);
> > 
> >         printf("fp0: %p  fp1: %p\n", fp0, fp1);
> >         return 0;
> >     }
> > 
> > The code should output
> > 
> >     Hello World! 0
> >     fp0: <addr>  fp1: <the-same-addr!>
> > 
> > right? on i386:
> > 
> >     $ gcc -o fp_i386 -O2 fp.c
> >     $ ./fp_i386
> >     Hello World! 0
> >     fp0: 0xafeda538  fp1: 0xafeda538
> > 
> > 
> > but it does not on arm, look:
> > 
> >     $ arm-linux-gnueabi-gcc-4.4 -marm -o fp_arm -O2 fp.c
> >     $ qemu-arm -L /usr/arm-linux-gnueabi ./fp_arm
> >     Hello World! 0
> >     fp0: 0x4080010c  fp1: 0x8414
> > 
> > and the problem here is that gcc, as I think, incorrectly setups fp:
> > 
> >     $ arm-linux-gnueabi-gcc-4.4 -marm -o fp.s -S -O2 fp.c
> >     $ cat fp.s
> >             .cpu arm9tdmi
> >             .fpu softvfp
> >             .eabi_attribute 20, 1
> >             .eabi_attribute 21, 1
> >             .eabi_attribute 23, 3
> >             .eabi_attribute 24, 1
> >             .eabi_attribute 25, 1
> >             .eabi_attribute 26, 2
> >             .eabi_attribute 30, 2
> >             .eabi_attribute 18, 4
> >             .file   "fp.c"
> >             .text
> >             .align  2
> >             .global func
> >             .type   func, %function
> >     func:
> >             @ Function supports interworking.
> >             @ args = 0, pretend = 0, frame = 0
> >             @ frame_needed = 1, uses_anonymous_args = 0
> >             stmfd   sp!, {r4, r5, fp, lr}
> >             add     fp, sp, #12
> >             mov     r4, r0
> >             ldr     r5, [fp, #0]        ; <- bug here, should be [fp, #-4]
> >             mov     r1, r4
> >             ldr     r0, .L3
> >             bl      printf
> >             add     r0, r5, r4
> >             sub     sp, fp, #12
> >             ldmfd   sp!, {r4, r5, fp, lr}
> >             bx      lr
> >     .L4:
> >             .align  2
> >     .L3:
> >             .word   .LC0
> >             .size   func, .-func
> >             .align  2
> >             .global main
> >             .type   main, %function
> >     main:
> >             @ Function supports interworking.
> >             @ args = 0, pretend = 0, frame = 0
> >             @ frame_needed = 1, uses_anonymous_args = 0
> >             stmfd   sp!, {fp, lr}
> >             mov     r0, #0
> >             add     fp, sp, #4
> >             bl      func
> >             mov     r1, fp
> >             mov     r2, r0
> >             ldr     r0, .L7
> >             bl      printf
> >             mov     r0, #0
> >             sub     sp, fp, #4
> >             ldmfd   sp!, {fp, lr}
> >             bx      lr
> >     .L8:
> >             .align  2
> >     .L7:
> >             .word   .LC1
> >             .size   main, .-main
> >             .section        .rodata.str1.4,"aMS",%progbits,1
> >             .align  2
> >     .LC0:
> >             .ascii  "Hello World! %i\012\000"
> >             .space  3
> >     .LC1:
> >             .ascii  "fp0: %p  fp1: %p\012\000"
> >             .ident  "GCC: (Debian 4.4.6-11) 4.4.6"
> >             .section        .note.GNU-stack,"",%progbits
> 
> Agreed, that's weird. I fails equally on my machine (though the code is 
> different, maybe because I'm using armv7-a as machine arch).
> 
> > 
> > 
> > and if I apply the following patch
> > 
> >     diff --git a/t/fp.s b/t/fp1.s
> >     index 2047b05..85bb402 100644
> >     --- a/t/fp.s
> >     +++ b/t/fp1.s
> >     @@ -20,7 +20,7 @@ func:
> >             stmfd   sp!, {r4, r5, fp, lr}
> >             add     fp, sp, #12
> >             mov     r4, r0
> >     -       ldr     r5, [fp, #0]
> >     +       ldr     r5, [fp, #-4]
> >             mov     r1, r4
> >             ldr     r0, .L3
> >             bl      printf
> > 
> > it does work:
> > 
> >     $ arm-linux-gnueabi-gcc-4.4 -o fp fp.s
> >     $ arm-linux-gnueabi-gcc-4.4 -o fp1 fp1.s
> >     $ qemu-arm -L /usr/arm-linux-gnueabi ./fp
> >     Hello World! 0
> >     fp0: 0x4080012c  fp1: 0x8414
> > 
> >     $ qemu-arm -L /usr/arm-linux-gnueabi ./fp1
> >     Hello World! 0
> >     fp0: 0x4080012c  fp1: 0x4080012c
> > 
> > that's why I think gcc generated wrong code here.
> 
> In this case I agree. It seems that no matter what, __builtin_frame_address 
> loads fp, no matter where it points.
>  
> > 
> > And to the reference, here is how code would look like if being part of
> > linux:
> > 
> >     $ arm-linux-gnueabi-gcc-4.4 -o fpapcs.s     -marm -mapcs
> > -mno-sched-prolog  -S -O2 fp.c $ arm-linux-gnueabi-gcc-4.4 -o fpapcs_gnu.s
> > -marm -mapcs -mno-sched-prolog -mabi=apcs-gnu -S -O2 fp.c $ diff -u fp.s
> > fpapcs.s
> >     --- fp.s        2012-12-13 20:34:09.000000000 +0400
> >     +++ fpapcs.s    2012-12-13 20:34:09.000000000 +0400
> >     @@ -17,16 +17,17 @@
> >             @ Function supports interworking.
> >             @ args = 0, pretend = 0, frame = 0
> >             @ frame_needed = 1, uses_anonymous_args = 0
> >     -       stmfd   sp!, {r4, r5, fp, lr}
> >     -       add     fp, sp, #12
> >     +       mov     ip, sp
> >     +       stmfd   sp!, {r4, r5, fp, ip, lr, pc}
> >     +       sub     fp, ip, #4
> >             mov     r4, r0
> >             ldr     r5, [fp, #0]
> >             mov     r1, r4
> >             ldr     r0, .L3
> >             bl      printf
> >             add     r0, r5, r4
> >     -       sub     sp, fp, #12
> >     -       ldmfd   sp!, {r4, r5, fp, lr}
> >     +       sub     sp, fp, #20
> >     +       ldmfd   sp, {r4, r5, fp, sp, lr}
> >             bx      lr
> >      .L4:
> >             .align  2
> >     @@ -40,17 +41,18 @@
> >             @ Function supports interworking.
> >             @ args = 0, pretend = 0, frame = 0
> >             @ frame_needed = 1, uses_anonymous_args = 0
> >     -       stmfd   sp!, {fp, lr}
> >     +       mov     ip, sp
> >     +       stmfd   sp!, {fp, ip, lr, pc}
> >     +       sub     fp, ip, #4
> 
> thus fp points to the location where pc is stored
> 
> >             mov     r0, #0
> >     -       add     fp, sp, #4
> >             bl      func
> >             mov     r1, fp
> >             mov     r2, r0
> >             ldr     r0, .L7
> >             bl      printf
> >             mov     r0, #0
> >     -       sub     sp, fp, #4
> >     -       ldmfd   sp!, {fp, lr}
> >     +       sub     sp, fp, #12
> >     +       ldmfd   sp, {fp, sp, lr}
> >             bx      lr
> >      .L8:
> >             .align  2
> > 
> >     $ diff -u fpapcs.s fpapcs_gnu.s
> >     --- fpapcs.s    2012-12-13 20:34:09.000000000 +0400
> >     +++ fpapcs_gnu.s        2012-12-13 20:32:51.000000000 +0400
> >     @@ -1,13 +1,3 @@
> >     -       .cpu arm9tdmi
> >     -       .fpu softvfp
> >     -       .eabi_attribute 20, 1
> >     -       .eabi_attribute 21, 1
> >     -       .eabi_attribute 23, 3
> >     -       .eabi_attribute 24, 1
> >     -       .eabi_attribute 25, 1
> >     -       .eabi_attribute 26, 2
> >     -       .eabi_attribute 30, 2
> >     -       .eabi_attribute 18, 4
> >             .file   "fp.c"
> >             .text
> >             .align  2
> > 
> > 
> > but it does not work either:
> > 
> >     $ arm-linux-gnueabi-gcc-4.4 -o fpapcs -mapcs -mno-sched-prolog fpapcs.s
> >     $ qemu-arm -L /usr/arm-linux-gnueabi ./fpapcs
> >     Hello World! 0
> >     fp0: 0x4080011c  fp1: 0x83e0
> 
> Of course since it's the return address which is displayed
> 
> > 
> > 
> > ~~~~
> > 
> > There are many arm abis and maybe I'm doing something stupid, but to me
> > all this looks like gcc generates incorrect code.
> 
> Are you sure the code you compiled for ARM doesn't contain 
> __builtin_return_address instead of __builtin_frame_address?

Thomas, thanks for correcting my arm assembly errors. I agree I did make
mistakes, but I'm sure __builtin_frame_address() is used, not
__builtin_return_address(), and that __builtin_frame_address(1) is wrong.
Here it is once again:

    $ cat fp.c
    #include <stdio.h>

    void *func(int x)
    {
        char *fp1 = __builtin_frame_address(1);

        printf("Hello World! %i\n", x);

        fp1 += x;
        return fp1;
    }

    int main()
    {
        void *fp0 = __builtin_frame_address(0);
        void *fp1 = func(0);

        printf("fp0: %p  fp1: %p\n", fp0, fp1);
        return 0;
    }

    $ arm-linux-gnueabi-gcc-4.4 -ofpapcs -marm -mapcs -mno-sched-prolog -O2 fp.c

    $ qemu-arm -L /usr/arm-linux-gnueabi ./fpapcs
    Hello World! 0
    fp0: 0x4080010c  fp1: 0x83e0


> > > > ... as probably required by arm calling convention (not looked at the
> > > > spec yet, but it seems reasanoble, given how push is really a block
> > > > str and that str stores register in ascending order).
> > > 
> > > I have once seen a page in MSDN describing the ARM stack frame of
> > > Windows CE. I don't know if ARM specified how stack frames should look
> > > like.
> > 
> > From gcc manual:
> > 
> > `-mapcs-frame'
> >      Generate a stack frame that is compliant with the ARM Procedure
> >      Call Standard for all functions, even if this is not strictly
> >      necessary for correct execution of the code.  Specifying
> >      `-fomit-frame-pointer' with this option will cause the stack
> >      frames not to be generated for leaf functions.  The default is
> >      `-mno-apcs-frame'.
> > 
> > so at least there is "ARM Procedure Call Standard". Sorry, no time to
> > search for it...
> > 
> > 
> > Thanks,
> > Undertimed Kirill
> 
> Sorry for taking so much time to answer you.

Nevermind... and thanks,
Kirill



reply via email to

[Prev in Thread] Current Thread [Next in Thread]