discuss-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: objective-c: how slow ?


From: Marko Mikulicic
Subject: Re: objective-c: how slow ?
Date: Thu, 06 Sep 2001 03:38:53 -0400
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.3) Gecko/20010801

Malmberg wrote:
> 2. They're hard to do in a thread-safe way without locking (probably
> impossible).

Deferred update is not possible because maintaing call statistics
is a performance bottleneck in objc, so I guess that IC are not feasible
in objc in a thread safe way. Any other ideas ?


Well, if you really _really_ need the extra 4-5 cycles/call, how about
adding a runtime function objc_need_more_speed(@selector(foo: bar:
zot:),[SomeClass class],[SomeOtherClass class],...,NULL);. It would
generate an 'outline' stub (style 2) with those classes and change the
sarray entries for the selector in those classes to point to a function
that patches the caller to directly call the stub. You could even add an
argument that controls whether the stub should bother to correctly
handle a nil receiver or not (if you can guarantee that you'll never
call that selector with a nil receiver you can gain ~1 cycle/call).
There are a few practical problems, but I think it could be done.

You have manual control over the PIC.
Why you can't simply have a macro generate the code in place ?
Perhaps some stubs could be reused reducing code size.
For example: if we know that some collection contains x kinds of
objects, of which only a few really dominates (the top 3 are more that 70%,say)
it could help use a common stub for accessing an element from the array (when the messages sent to that object are in a tight loop or sparse across deep
call chain, so maintaing a global per iteration IMP is not feasible).
You could manually know at runtime which classes are to be cached running
a kind of "prepare" pass, but it's stricly application dependent and if some end user really needs that, he can probably do it best himself.

Is the cost of the JMP to the stub worh ?

I'm know convinced that implementing PICs in objc is not worth the effort
beacouse the very nature of this language. Generated code is mixed with user code. We have only spoken of simple situations, but when the method call is intermangled with some nasty C stuff we never know how (and witch compiler backend, depending even on the version of gcc) will gcc alloc register and shedule instructions. If we generate stubs in C code, gcc will handle them best as he can, but handcrafted assembler is expecially well suited when reimplementing things as method calls because they deviate a bit from standard C function calls. In Self everything is generated, so a clever compiler can play with all factors wich is not only instruction best-case duration.

Theoretically, there's recompilation.

Sorry, what do you mean ?


I had a look at some of the PIC-papers, and it seems their target is
quite different from ours. As far as I can tell, they put a direct call
at ~200ns and a normal lookup at ~15000ns (~75 times slower). With the
optimized obj-c lookup, a direct call takes ~27.5ns, and a lookup
~47.5ns (1.7 times slower).

Objc has a simpler lookup because it doesn't have multiple inheritance,
delegation and dynamic inheritance. Moreover Self has to use true message sends
also for accessing instance variables (local variables too, theoretically, but
they are inlined)

Self has comparable execution times to objc IC in the case of a cache hit.
It uses profiling information to update PICs and also to know when and where
to inline a method body, saving the whole call.
I'm comparing apples to bicycles because Self has very short methods on average and also has to inline most control structures to be usable, while preserving the possibility to extend basic flow control (ifTrue: and co.)
 The only drawback I see in Self is that it does great at the expense of code
size which eats icache.

How are C++ virtual calls handled in shared libraries ?
I've heard that the biggest performance problem in the startup
of a KDE app is copy-on-write of virtual pointer tables contained
in shared libraries. They must be updated because the shared libraries can
be mapped at different addresses.


I suppose so, though I've never looked at how that's done. I'll see if I
can find the code responsible for this.

Sorry, it was a raetoric question.
The dynamic linker updates the virtual pointer tables.
KDE guys have solved this problem in the last release adding a kind of
prelinking. you can find interesting info at http://www.research.att.com/~leonb/objprelink/

Marko




reply via email to

[Prev in Thread] Current Thread [Next in Thread]