[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t
From: |
David Brown |
Subject: |
[avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t |
Date: |
Wed, 22 Apr 2009 10:55:27 +0200 |
User-agent: |
Thunderbird 2.0.0.21 (Windows/20090302) |
Dale Whitfield wrote:
Hi David >@2009.04.21_19:42:34_+0200
Dale Whitfield wrote:
Hi,
All this talk of super-optimisers, etc. brings this issue I've had,
to the fore again.
Perhaps there are suggestions on how to encourage the compiler to do the
optimisations rather than having to hand-code to get the best result.
Take this code:
volatile uint32_t status;
static inline void set_status(uint32_t flag, uint8_t set)
__attribute__((always_inline));
void set_status(uint32_t flag, uint8_t set) {
if (set) status |= flag;
else status &= ~flag;
}
Which when used in an interrupt generates:
ISR (INT7_vect)
{
<snip>
That's way too much code... Its fairly obvious where the optimisations
should be, although I can see that some have been done already.
I can't see much possibility for improving the above code (except by
removing the push and zeroing of r1). You asked for "status" to be a
volatile 32-bit int, so that's what you got. The code below is
semantically different, and thus compiles differently.
I don't believe it is semantically different. Which is one reason I
raised this. I am using the version below in existing code and it
behaves correctly.
Loading 4 registers and then storing back 3 that are unchanged makes no
sense at all.
I'm afraid that's how "volatile" works. In a way, by using "volatile"
you are telling the compiler "I *know* this makes no sense to you, but
do it my way anyway".
Where volatile comes in here is that the optimisation shouldn't use any
previous loads or modify register values more than once without
reloading/storing etc. Here, the value is loaded once, one byte is
changed and then all 4 are stored. That's wasteful.
That's what you get when you make a 32-bit item volatile - all accesses
to it use the whole 32-bit data, every time.
Suppose you have a 16-bit hardware timer that must always be set low
byte first, then high byte. If you make a change that just affects the
low byte, you still want the compiler to generate the full 16-bit write
- you achieve that by declaring the timer as a 16-bit volatile variable.
How is the compiler supposed to guess that you mean something
different in this case?
In this example, the interrupt routine is half as long when, as I see
it, correctly optimised. On an interrupt particularly, this could be
critical time/cycle saving.
I agree with your conclusion - the code is longer and slower than
needed. But you are totally wrong on your premise - this is not because
the compiler is failing in its optimisation. It is because *your* code
forces it to generate long and slow code.
There is no reason for using "volatile" in this situation - making
"status" non-volatile will give you exactly the code you want here. But
it may then cause trouble for other code, depending on how you use
"status" and "set_status".
Try the following version of set_status:
void set_status(uint32_t flag, uint8_t set) {
if (set) *(uint32_t* &status) |= flag;
else *(uint32_t* &status) &= ~flag;
asm("" : : : "memory");
}
First, we remove the "volatile" from status so that the optimiser can
give you a single byte access when the function is inlined. Secondly,
instead of using "volatile" to protect the access to "status", we use a
memory clobber that forces writes to memory variables to be completed
before going on. This will ensure that the change is written out to
status as soon as possible (otherwise changes could be cached for later
storage).
(I haven't tried compiling that code yet.)
- [avr-gcc-list] Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/21
- Re: [avr-gcc-list] Optimisation of bit set/clear in uint32_t, Georg-Johann Lay, 2009/04/21
- [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/21
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Alex Wenger, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Alex Wenger, 2009/04/22
- [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Paulo Marques, 2009/04/22
- [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t,
David Brown <=
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
- [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
- [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Graham Davies, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Graham Davies, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Paulo Marques, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Alex Wenger, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dave Hylands, 2009/04/22