qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 7/7] tcg-i386: Perform tail call to qemu_ret_ld*


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH 7/7] tcg-i386: Perform tail call to qemu_ret_ld*_mmu
Date: Thu, 29 Aug 2013 18:36:47 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8

Il 29/08/2013 18:08, Richard Henderson ha scritto:
> Where I do think there's cause for treading carefully is wrt Aurelien's
> statement "it's the slow path exception, the call-return stack doesn't 
> matter".
>  Alternately, given that it *is* the slow path, who cares if the return from
> the helper immediately hits a branch, rather than tail-calling back into the
> fast path, if the benefit is that the call-return stack is still valid above
> the code_gen_buffer after a simple tlb miss?

Aurelien's comment was that lea+push+jmp is smaller than lea+call+ret,
which I can buy.

I guess it depends more than everything on the hardware implementation
of return branch prediction, and _how much_ the call-return stack is broken.

PPC's mtlr+b+...+blr and x86's push+jmp+...+ret are quite similar in
this respect, and they beg the same question.  After the blr/ret, is the
entire predictor state broken or will the processor simply take a miss
and still keep the remainder of the stack valid?  (For x86 it could in
principle see that the stack pointer is lower and thus keep the entries
above it.  For PPC it's not that simple since LR is a callee-save
register, but there's probably plenty of tricks and heuristics that can
be employed).

> 
> As an aside, why why o why do we default to -fstack-protector-all?  Do we
> really need checks in every single function, as opposed to those that actually
> do something with arrays?  Switch to plain -fstack-protector so we have
> 
>> 00000000005a1fd0 <helper_ret_ldsw_mmu>:
>>   5a1fd0:       48 83 ec 08             sub    $0x8,%rsp
>>   5a1fd4:       e8 57 fe ff ff          callq  5a1e30 <helper_ret_lduw_mmu>
>>   5a1fd9:       48 83 c4 08             add    $0x8,%rsp
>>   5a1fdd:       48 0f bf c0             movswq %ax,%rax
>>   5a1fe1:       c3                      retq   
>>   5a1fe2:       66 66 66 66 66 2e 0f    data32 data32 data32 data32 nopw 
>> %cs:0x0(%rax,%rax,1)
>>   5a1fe9:       1f 84 00 00 00 00 00 
> 
> and then lets talk about icache savings...

I think it was simply paranoia + not knowing the difference.  Patch
welcome I guess.  (And I admit I only skimmed the patches so I didn't
know how small the wrappers were).

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]