[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [avr-libc-dev] -O3? -Os?
From: |
E. Weddington |
Subject: |
Re: [avr-libc-dev] -O3? -Os? |
Date: |
Mon, 16 Dec 2002 09:18:32 -0700 |
On 16 Dec 2002 at 16:06, Joerg Wunsch wrote:
> I've always been curious what we actually gain by using -O3 for the
> `larger' AVR devices when compiling the library. So i finally wrote a
> test case, and ran it on an ATmega128. In order to create a test job
> that might profit as best as possible from any speed enhancement made
> inside avr-libc, i decided that sorting strings would serve this task
> quite well: it contains calls to library functions that are
> optimizable, and that take a bit of CPU in order to execute (qsort()).
> I used qsort to sort an array of strings (boldly borrowed the first
> lines from the famous "Bastard Operator from Hell" for it :), once
> using the normal strcmp() function, and another time using a function
> effectively sorting the array by string size.
>
> The resulting object file has been linked against a current avr-libc,
> where the library was configured and compiled with different
> optimization options (avrlib_opt_speed in configure.in).
>
> Here's the results:
>
> -O3:
>
> % avr-size test.out
> text data bss dec hex filename
> 6898 1980 10 8888 22b8 test.out
>
> time for qsort(strcmp): 0.000903 seconds.
> time for qsort(strlencmp): 0.019705 seconds.
> done.
>
> -mcall-prologues -Os:
>
> % avr-size test.out
> text data bss dec hex filename
> 6474 1980 10 8464 2110 test.out
>
> time for qsort(strcmp): 0.000972 seconds.
> time for qsort(strlencmp): 0.020069 seconds.
> done.
>
> -Os:
>
> % avr-size test.out
> text data bss dec hex filename
> 6618 1980 10 8608 21a0 test.out
>
> time for qsort(strcmp): 0.000955 seconds.
> time for qsort(strlencmp): 0.020069 seconds.
> done.
>
> -O2:
>
> % avr-size test.out
> text data bss dec hex filename
> 6666 1980 10 8656 21d0 test.out
>
> time for qsort(strcmp): 0.000972 seconds.
> time for qsort(strlencmp): 0.020069 seconds.
>
>
> It's interesting to note that all attempts to modify the flags except
> -O3 basically gain nothing at all in terms of speed, with
> -mcall-prologues -Os (our default for the `small' AVR devices)
> yielding the smallest code size. (The difference between 955 µs and
> 972 µs ist just a single timer-tick only, so take that with a grain of
> salt.)
>
> For -O3, the code size is ~ 6 % larger (even more bloat if you
> consider that vfprintf() & Co. take up about 25 % of the text segment
> and are unaffected by the global -O settings since they use private,
> hand-crafted optimization flags). The speed gain is between 2 and 6
> %.
>
>
> My vote would be to use -mcall-prologues -Os for any of our targets.
>
Would you want to write up a new FAQ entry about this?
Eric