avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[avr-gcc-list] Telling inline asm that input register is clobbered?


From: Paul Sokolovsky
Subject: [avr-gcc-list] Telling inline asm that input register is clobbered?
Date: Mon, 10 Dec 2012 02:29:51 +0200

Hello,

I'd like to reimplement __builtin_avr_delay_cycles() function in inline
assembly. The reason is that __builtin_avr_delay_cycles() has
too-early operand checking, so for example

static __attribute__((__always_inline__)) my_delay(int cycles)
{
        __builtin_avr_delay_cycles(cycles);
}

Will still complain that __builtin_avr_delay_cycles(cycles) is not
constant even for cases like my_delay(10).

This could be resolved with having __delay_cycles() as normal function,
so I did following for starters:

ALWAYS_INLINE void __delay_cycles2(long delay)
{
    uint16_t d = delay >> 2;
    asm volatile(
            "1: \n"
            "sbiw   %0, 1 \n"
            "brne   1b \n"
            : : "w" (d)
    );
}

The problem is that gcc doesn't know that "w" reg value is trashed
after this assembly code executes, so

__delay_cycles2(100000);
__delay_cycles2(100000);

leads to:

  9c:   88 ea           ldi     r24, 0xA8       ; 168
  9e:   91 e6           ldi     r25, 0x61       ; 97
  a2:   01 97           sbiw    r24, 0x01       ; 1
  a4:   f1 f7           brne    .-4             ; 0xa2 <main+0x8>
  a8:   01 97           sbiw    r24, 0x01       ; 1
  aa:   f1 f7           brne    .-4             ; 0xa8 <main+0xe>

(I actually have a bit more complicated code than just 2
__delay_cycles2() calls in row, unrelated asm is not shown above).

Well, what needs to do is to add clobber constraint. But how to do
that? Having ': : "w" (d) : "r25", "r26"' just makes gcc use r26/r27
with the same effect. Trying to use matching constraint ': : "w" (d) :
"0"' seems to be just ignored, leading to the same code as above.

After some looking, I found am example workaround at
http://www.nongnu.org/avr-libc/user-manual/inline_asm.html ("void
delay(uint8_t ms)" in there). So, using ": "=&w" (d) : "0" (d)" at
least doesn't produce broken code. But that really looks like a
workaround - the code above *does not* produce any result, so telling
compiler it should store "result" back into variable looks ugly and
leaves only to pray for good live scope tracking (gcc to see that
"result" is not used anywhere and not try to do stores). But I wonder if
gcc does its job well. The code above actually compiles very optimally:

  9a:   28 ea           ldi     r18, 0xA8       ; 168
  9c:   31 e6           ldi     r19, 0x61       ; 97
  a0:   c9 01           movw    r24, r18
  a2:   01 97           sbiw    r24, 0x01       ; 1
  a4:   f1 f7           brne    .-4             ; 0xa2 <main+0xa>
  a8:   c9 01           movw    r24, r18
  aa:   01 97           sbiw    r24, 0x01       ; 1
  ac:   f1 f7           brne    .-4             ; 0xaa <main+0x12>

So, gcc sees "common subexpression" and caches in another reg pair.
However, using different vals:

__delay_cycles2(100000);
__delay_cycles2(100004);

leads to:

  9a:   48 ea           ldi     r20, 0xA8       ; 168
  9c:   51 e6           ldi     r21, 0x61       ; 97
  9e:   29 ea           ldi     r18, 0xA9       ; 169
  a0:   31 e6           ldi     r19, 0x61       ; 97
  a4:   ca 01           movw    r24, r20
  a6:   01 97           sbiw    r24, 0x01       ; 1
  a8:   f1 f7           brne    .-4             ; 0xa6 <main+0xe>
  ac:   c9 01           movw    r24, r18
  ae:   01 97           sbiw    r24, 0x01       ; 1
  b0:   f1 f7           brne    .-4             ; 0xae <main+0x16>

That doesn't look optimal at all - I'd expect compiler to load values
directly into r24 just before usage.

So, I wonder if "=&w" plays role in this, and if there's a better way
to do it (like, exactly specify that input reg is clobbered)?

Oh, and btw, my initial attempt was at all with using "ldi r24, $0"
with "M" constraint, but I hit the same issue as with
__builtin_avr_delay_cycles() - "M" constrain apparently expects literal
integer value, short-mindedly ignoring symbols which may be just "const
int". (For comparison, "load immediate" approach works well with mspgcc:
https://github.com/pfalcon/PeripheralTemplateLibrary/blob/master/include/delay_static_msp430.hpp#L73
)


Thanks,
 Paul                          mailto:address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]