discuss-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Objective-C and Smalltalk; speed of message send


From: Alexander Malmberg
Subject: Re: Objective-C and Smalltalk; speed of message send
Date: Tue, 10 Aug 2004 13:15:44 +0200

Well, to add some real numbers to this ;), try the attached program.
ix86 only, although there should be equivalents to rdtsc on other
platforms (it returns the raw clock cycle count from the processor).

Compile with -O2, -fomit-frame-pointer to reduce overhead in the called
dummies, -fPIC since that's what we use for all GNUstep code (and,
amazingly, it gives faster code here).

My values are for a PII, gcc 3.5 snapshot with my optimized message
lookup patch:
http://w1.423.telia.com/~u42308495/alex/objc_msg_lookup_regparm-1.tar.gz

Jeff Teunissen measured with an unpatched gcc on an athlon xp system.

                                            My           Jeff's
                             Loop overhead:  6 cycles    ?, likely 6
cycles
                    Normal c function call:  6 cycles    6 cycles
             C call, two args (self, _cmd):  8 cycles    7 cycles
Indirect call, two args (aka. IMP caching):  9 cycles    7 cycles
                              Message send: 24 cycles   37 cycles

This is GNU runtime (of course :). Receiver and message are constant,
although that's irrelevant since the GNU runtime does lookups in
constant time (modulo the necessary stuff from memory being in the
cache).

Thus, c calls are essentially free, IMP caching costs 1-3 cycles, and a
message send costs ~31 cycles normally, ~18 cycles with my patch.
Excluding loop overhead, on a 1GHz system, I'd expect around 30 million
message sends/second with normal gcc, ~60 million/second with my patch.

All this is assuming that the callee doesn't do anything. Without
-fomit-frame-pointer, you get an extra 2-3 cycles of frame setup in all
methods, and eg. the common case in -characterAtIndex: for 8-bit strings
(in range, character is an ascii character) is ~14 cycles.

As a final note, recent oprofile data from Matt Rice puts
objc_msg_lookup at 15%-20% of execution time in messaging heavy -gui
code. In other words, if message lookup was free, our programs would be
~20% faster. Significant, yes, but 20% really isn't that much.


Thus, I maintain that message sending is really quite cheap. :) Except
for performance critical code, I'd rather take a 20% performance hit
than uglify my code with IMP caching, and I wouldn't shy away from
messaging heavy code. :)

- Alexander Malmberg
#include <objc/Object.h>

static inline unsigned long long int llclock(void)
{
        unsigned long long int a;
        asm volatile ("rdtsc" : "=&A" (a));
        return a;
}

void foo(void) __attribute__ ((weak));
void foo(void)
{
}

void foo2(id foo, SEL s) __attribute__ ((weak));
void foo2(id foo, SEL s)
{
}

int i=1000;

void (*foo2_id)(id,SEL)=foo2;

@implementation Object (foo)
-(void) foo
{
}
@end

int main(int argc, char **argv)
{
        unsigned long long int t1,t2;
        id self=[Object alloc];
        SEL cmd=@selector(foo);
        void (*foo3)(id,SEL)=foo2_id;

        t1=llclock();
        while (i--)
        {
//              foo(); /* ~6 cycles/call */
//              foo2(self,cmd); /* ~8 cycles/call */
//              foo3(self,cmd); /* ~9 cycles/call */
                [self foo]; /* ~24 cycles/call */
        }
        t2=llclock();
        t2-=t1;

        printf("%llu clock cycles\n",t2);

        return 0;
}


reply via email to

[Prev in Thread] Current Thread [Next in Thread]