[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] ASM Output?

From: Dave Dodge
Subject: Re: [Tinycc-devel] ASM Output?
Date: Wed, 17 Nov 2004 20:00:30 -0500
User-agent: Mutt/1.4.2i

On Thu, Nov 18, 2004 at 01:18:00AM +1100, Jeff Schultz wrote:
> P.S.  I assume it's a cut down test case, hence the unused extra entry
> in the table.

Actually it was a screwup in my attempt to quickly cut it down to a
small size.  I'd really meant to do "&" instead of "%".  The real code
uses a 256-element table.  It still demonstrated the problem on my
test system, but here's a better version anyway:

#include <stdio.h>
#include <stdint.h>

static inline uint8_t foo(uint8_t const x)
        static const uint8_t table[256] = {
                "0123456789abcdef" "0123456789abcdef" "0123456789abcdef"
                "0123456789abcdef" "0123456789abcdef" "0123456789abcdef"
                "0123456789abcdef" "0123456789abcdef" "0123456789abcdef"
                "0123456789abcdef" "0123456789abcdef" "0123456789abcdef"
                "0123456789abcdef" "0123456789abcdef" "0123456789abcdef"

        return table[x];

int main(void)
        int ch;
        while((ch = getc(stdin)) != EOF)
        return 0;

> FWIW, I see gcc 3.3.2 -O3 on a K7 as 1.5 times faster than tcc 0.9.22
> on this program.

Given the above code and 32M of random data, using

  a.out < data > /dev/null"

When I run the tests on a Celeron, tcc and gcc are fairly close.  But
as these numbers show, there appears to be some sort of pathological
behavior with gcc's output and the P4.  Even if I tell gcc to tune the
code to the P4, it doesn't get any better:

P4 3.0GHz 512K cache kernel 2.4.20
gcc 3.4.3 -g -O3                8.145/8.100/0.050
gcc 3.2.2 (redhat) -g -O3       8.148/8.130/0.020
tcc 0.9.22 -g                   2.215/2.220/0.000

Celeron 2.4GHz 128K cache kernel 2.4.25
gcc 3.4.2 -g -O3                2.209/2.150/0.060
gcc 3.3.3 -g -O3                2.215/2.190/0.020
tcc 0.9.22 -g                   2.377/2.320/0.060

Looking at the assembly from gcc, nothing jumps out as an obvious
problem; but since my attempts to do things like speed up code with
SSE intrinsics always backfires horribly I probably don't really know
what to look for :-)

                                                  -Dave Dodge

reply via email to

[Prev in Thread] Current Thread [Next in Thread]