qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 0/7] TCG global variables clean-up


From: Evgeny Voevodin
Subject: Re: [Qemu-devel] [PATCH v2 0/7] TCG global variables clean-up
Date: Mon, 29 Oct 2012 10:27:23 +0400
User-agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121011 Thunderbird/16.0.1

On 10/27/2012 06:34 PM, Blue Swirl wrote:
On Fri, Oct 26, 2012 at 6:32 AM, Evgeny Voevodin <address@hidden> wrote:
Today I made more precise testing with usage of --enable-profiler.

Here is the test procedure:
1. Boot Linux Kernel 5 times.
2. For each iteration wait while "JIT cycles" is stable for ~10 seconds
3. Write down the "cycles/op"

Here are the results:

Before clean-up:
min: 731.9
max: 735.8
avg: 734.3
standard deviation: ~2 = 0.3%
Avarage cycles/op = 734 +- 2

After clean-up:
min: 747.2
max: 751.7
avg: 750.5
standard deviation: ~2 = 0.3%
Avarage cycles/op = 750 +- 2
Slow-down of TCG code generation = 2.2%


After clean-up with TCGContext *const tcg_cur_ctx:
min: 730.6
max: 733.2
avg: 728.7
standard deviation: ~2 = 0.3%
Avarage cycles/op = 729 +- 2
Slow-down of TCG code generation = 0%

I suggest to define tcg_cur_ctx as TCGContext *const.
Then we will get rid of TCG code generation slow-down and also
will have no usage of global variables.
How does this compare with the original version without pointers? I
think that version may be safer to be assumed to be optimized by the
compiler.

I did more testing with different gcc versions and different patch series:

gcc verion v1 clean-up, no pointer v2 clean-up, const pointer master gcc-4.4 754.3 752.1 769.8 gcc-4.5 770.8 779.8 774.8 gcc-4.6 731.8 729.8 737

Conclusion:
- First clean-up series without pointer operates faster than master in all cases. It's probably because
   data is cached more efficiently.
- Second clean-up series with constant pointer operates faster than master in the case of gcc-4.4 and gcc-4.6. In the case of gcc-4.5 it seems that const pointer is not optimised as I assumed.

I think that it's worth to generate third series without pointer and with code clean-up included in second.

How do you think?


On 10/25/2012 10:45 AM, Evgeny Voevodin wrote:
Here are the results of tests before and after this patch series was
applied:

* EEMBC CoreMark (before -> after)
    - Guest: Exynos4210 ARMv7, Linux (Custom buildroot image)
    - Host: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz, 4GB RAM, Linux
    - Results: 1148.105626 -> 1161.440186 (+1.16%)

* nbench (before -> after)
    - Guest: Exynos4210 ARMv7, Linux (Custom buildroot image)
    - Host: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz, 4GB RAM, Linux
    - Results
      . MEMORY INDEX: 1.864 -> 1.862 (-0.11%)
      . INTEGER INDEX: 2.518 -> 2.523 (+0.2%)
      . FLOATING-POINT INDEX: 0.385 -> 0.394 (+2.34%)


Those tests show that it became even faster :))

But I'm quite sceptical about such results.
The thing is that in case of nbench it prints the warning if results are
not 95% statistically accurate.
So we can be sure that nbench result is 95% accurate.
And it's obvious that result shown above are in the scope of this
accuracy.
I don't know the accuracy of CoreMark.

So, the main decision we can make that this patch series didn't
introduce any slow-down comparable to inaccuracy of the measurement.

Is this enough?

On 10/23/2012 10:21 AM, Evgeny Voevodin wrote:
This set of patches moves global variables to tcg_ctx:
gen_opc_ptr
gen_opparam_ptr
gen_opc_buf
gen_opparam_buf

Build tested for all targets.
Execution tested on ARM.

I didn't notice any slow-down of kernel boot after this set was applied.

Changelog:
v1->v2:
Introduced TCGContext *tcg_cur_ctx global to use in those places where
we don't
have an interface to pass pointer to tcg_ctx.
Code style clean-up

Evgeny (2):
    tcg/tcg.h: Duplicate global TCG variables in TCGContext
    TCG: Remove unused global variables

Evgeny Voevodin (5):
    translate-all.c: Introduce TCGContext *tcg_cur_ctx
    TCG: Use gen_opc_ptr from context instead of global variable.
    TCG: Use gen_opparam_ptr from context instead of global variable.
    TCG: Use gen_opc_buf from context instead of global variable.
    TCG: Use gen_opparam_buf from context instead of global variable.

   gen-icount.h                  |    2 +-
   target-alpha/translate.c      |   10 +-
   target-arm/translate.c        |   10 +-
   target-cris/translate.c       |   13 +-
   target-i386/translate.c       |   10 +-
   target-lm32/translate.c       |   13 +-
   target-m68k/translate.c       |   10 +-
   target-microblaze/translate.c |   13 +-
   target-mips/translate.c       |   11 +-
   target-openrisc/translate.c   |   13 +-
   target-ppc/translate.c        |   11 +-
   target-s390x/translate.c      |   11 +-
   target-sh4/translate.c        |   10 +-
   target-sparc/translate.c      |   10 +-
   target-unicore32/translate.c  |   10 +-
   target-xtensa/translate.c     |    8 +-
   tcg/optimize.c                |   62 ++++----
   tcg/tcg-op.h                  |  324
++++++++++++++++++++---------------------
   tcg/tcg.c                     |   85 ++++++-----
   tcg/tcg.h                     |   11 +-
   translate-all.c               |    4 +-
   21 files changed, 328 insertions(+), 323 deletions(-)



--
Kind regards,
Evgeny Voevodin,
Technical Leader,
Mobile Group,
Samsung Moscow Research Center,
e-mail: address@hidden


--
Kind regards,
Evgeny Voevodin,
Technical Leader,
Mobile Group,
Samsung Moscow Research Center,
e-mail: address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]