[avr-gcc-list] Speeding [u]ltoa by 35%

avr-gcc-list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[avr-gcc-list] Speeding [u]ltoa by 35%

From:	Georg-Johann Lay
Subject:	[avr-gcc-list] Speeding [u]ltoa by 35%
Date:	Tue, 19 Jun 2012 09:00:12 +0200
User-agent:	Thunderbird 2.0.0.24 (Windows/20100228)

Reviewing the ltoa sources [0] shows that the implementation's
speed is not award winning: it uses 32-bit div+mod.

There was propose [1] to speed ltoa by means of LUT but that
approach tries to use lookup tables for special radices.

The new proposal inlines __divmodsi4 leading to an ~35% tick gain
without increasing code size:

#define VAL     22
#define STR     20
#define RADIX   18

#define REST    26
#define CNT     19

DEFUN my_ltoa
    movw    r30,    STR

    mov     r27,    STR+1
    mov     TMP,    STR

    cpi     RADIX,   36+1
    cpc     RADIX+1, ZERO
    brsh 99f
    cpi     RADIX,  2
    brlo 99f

    clt
    cpi     RADIX,  10
    brne 1f
    bst     VAL+3,  7
    brtc 1f
    com     VAL+3
    com     VAL+2
    com     VAL+1
    neg     VAL
    sbci    VAL+1,  -1
    sbci    VAL+2,  -1
    sbci    VAL+3,  -1

1:  clr     REST
    ldi     CNT,    32

2:  lsl     VAL
    rol     VAL+1
    rol     VAL+2
    rol     VAL+3
    rol     REST
    cp      REST,   RADIX
    brlo 3f
    sub     REST,   RADIX
    inc     VAL
3:  dec     CNT
    brne 2b

    subi    REST,   -'0'
    cpi     REST,   '9'+1
    brlo 4f
    subi    REST,   '0'-'a'+10

4:  st      Z+,     REST
    sbiw    VAL+2,  0
    sbci    VAL+1,  0
    sbci    VAL,    0
    brne 1b

    brtc 99f
    ldi     CNT,    '-'
    st      Z+,     CNT

99:
    mov     r24,    TMP
    mov     r25,    r27
    st      Z+,     ZERO
    jmp     strrev

ENDF my_ltoa

Some notes on the code:

* The code is not yet documented or ready to be dropped into
  avr-libc sources (coding style, parametrize hardware)

* Speed gain 34%..39% see [1] and [2] for detail analysis

* Code size is same as old implementation

* Won't drag in __divmodsi4, thus code size decrease of > 100
  bytes if application otherwise does not need 32-bit div

* Less stack usage

* Strict radix checking: original version is not strict, e.g.
  0x100 + 10 will slip through

* Code size can be decreased further by moving the sanity
  check to C like so:

static inline __attribute__((always_inline))
char* ltoa (long x, char *str, int radix)
{
    if (radix < 2 || radix > 36)
    {
        *str = '\0';
        return str;
    }
    else
    {
        extern char* ltoa_asm (long, char*, unsigned char);
        return ltoa_asm (x, str, radix);
    }
}

Presumably almost all use cases use radix known at compile time.
Thus, the compiler can optimize away the sanity check. Moreover,
radix can be passed as 8-bit value: no need to pass 10 as 0x000a.

More notes:

* Similar improvement can be used for ultoa

* Code between [u]ltoa could be shared.

* There is ultoa_invert [4] that
    - str[] will NOT 0 terminated.
    - Sequence of digits is inverted.
    - It returns pointer to first byte after a string.

All these is readily available in ltoa, too. Again, code could
be shared.

A similar method could be used to speed to utoa/itoa with a speedup
of > 20%. But it is harder to work against code size increase,
and without the help of C inlining the code increases by ~10-16 bytes.
Again, code increase is no issue if the application does not need
16-bit divmod, as current i/ltoa will then drag that code.

Johann

--

[0] AVR-Libc: ltoa.S
http://svn.savannah.nongnu.org/viewvc/trunk/avr-libc/libc/misc/ltoa.S?revision=1944&root=avr-libc&view=markup

[1] avr-libc: optimization for ltoa/printf
http://lists.gnu.org/archive/html/avr-libc-dev/2011-06/msg00000.html

[2] Plot:
http://sourceforge.net/apps/mediawiki/mobilechessboar/nfs/project/m/mo/mobilechessboar/4/48/Ltoa.png

[3] Plot Description
http://sourceforge.net/apps/mediawiki/mobilechessboar/index.php?title=File:Ltoa.png#Description

[4] AVR-Libc: ultoa_invert.S
http://svn.savannah.nongnu.org/viewvc/trunk/avr-libc/libc/stdio/ultoa_invert.S?revision=2191&root=avr-libc&view=markup

[Prev in Thread]

Current Thread

[Next in Thread]

[avr-gcc-list] Speeding [u]ltoa by 35%, Georg-Johann Lay <=
- Re: [avr-gcc-list] Speeding [u]ltoa by 35%, Georg-Johann Lay, 2012/06/19
- Re: [avr-gcc-list] Speeding [u]ltoa by 35%, Joerg Wunsch, 2012/06/19
  - Re: [avr-gcc-list] Speeding [u]ltoa by 35%, Georg-Johann Lay, 2012/06/19
    - Re: [avr-gcc-list] Speeding [u]ltoa by 35%, Joerg Wunsch, 2012/06/20
    - Re: [avr-gcc-list] Speeding [u]ltoa by 35%, Georg-Johann Lay, 2012/06/21

Prev by Date: Re: [avr-gcc-list] Link Time Optimization vs __builtin_unreachable() and .signature?
Next by Date: Re: [avr-gcc-list] Speeding [u]ltoa by 35%
Previous by thread: [avr-gcc-list] Link Time Optimization vs __builtin_unreachable() and .signature?
Next by thread: Re: [avr-gcc-list] Speeding [u]ltoa by 35%
Index(es):
- Date
- Thread