[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation
From: |
Ulrich Hecht |
Subject: |
Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation |
Date: |
Mon, 9 Nov 2009 18:55:23 +0200 |
User-agent: |
KMail/1.9.10 |
On Monday 02 November 2009, Laurent Desnogues wrote:
> That indeed looks strange: fixing the TB chaining on ARM
> made nbench i386 three times faster. Note the gain was
> less for FP parts of the benchmark due to the use of
> helpers.
>
> out of curiosity could you post your tb_set_jmp_target1
> function?
I'm on an AMD64 host, so it's the same code as in mainline.
> The only thing I can think of at the moment that
> could make the code slower is that the program you ran
> was not reusing blocks and/or cache flushing in
> tb_set_jmp_target1 is overkill.
There is no cache flushing in the AMD64 tb_set_jmp_target1() function,
and the polarssl test suite is by nature rather repetitive.
I did some experiments, and it seems disabling the TB chaining (by
emptying tb_set_jmp_target()) does not have any impact on performance at
all on AMD64. I tested it with several CPU-intensive programs (md5sum
and the like) with AMD64 on AMD64 userspace emulation (qemu-x86_64), and
the difference in performance with TB chaining and without is hardly
measurable. The chaining is performed as advertised if enabled, I
checked that, but it does not seem to help performance.
How is this possible? Could this be related to cache size? I suspect the
Phenom 9500 of mine is better equipped in that area than the average ARM
controller.
And does the TB chaining actually work on AMD64 at all? I checked by
adding some debug output, and it seems to patch the jumps correctly, but
maybe somebody can verify that.
CU
Uli
--
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)