|
From: | Richard Henderson |
Subject: | Re: [Qemu-devel] [PATCH v3 17/27] tcg-ppc64: Implement bswap64 |
Date: | Tue, 02 Apr 2013 08:12:01 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130311 Thunderbird/17.0.4 |
On 2013-04-02 07:41, Alexander Graf wrote:
On 2013-04-01 23:34, Alexander Graf wrote:Is this faster than a load/store with std/ldbrx?Hmm. Almost certainly not. And since we've got stack space allocated for function calls, we've got scratch space to do it in. Probably similar for bswap32 too, eh?Depends - memory load/store doesn't come for free and bswap32 is quite short.I'll do a tiny bit o benchmarking for power7.Cool, thanks a bunch :)
Heh. "Almost certainly not" indeed. Unless I've made some silly mistake, going through memory stalls badly. No store buffer forwarding on power7? With the following test case, time reports: f1 2.967s f2 8.930s f3 7.071s f4 7.166s And note that f4 is a normal store/load pair, trying to determine what the store buffer forwarding delay might be. r~
z.c
Description: Text Data
[Prev in Thread] | Current Thread | [Next in Thread] |