qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/2] exec: alternative fix for master abort woes


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH 0/2] exec: alternative fix for master abort woes
Date: Thu, 07 Nov 2013 18:29:40 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130923 Thunderbird/17.0.9

Il 07/11/2013 17:47, Michael S. Tsirkin ha scritto:
> That's on kvm with 52 bit address.
> But where I would be concerned is systems with e.g. 36 bit address
> space where we are doubling the cost of the lookup.
> E.g. try i386 and not x86_64.

Tried now...

                P_L2_LEVELS pre-patch           post-patch
   i386         3                               6
   x86_64       4                               6

I timed the inl_from_qemu test of vmexit.flat with both KVM and TCG.  With
TCG there's indeed a visible penalty of 20 cycles for i386 and 10 for x86_64
(you can extrapolate to 30 cycles for TARGET_PHYS_ADDR_SPACE_BITS=32 targets).
These can be more or less entirely ascribed to phys_page_find:

                                 TCG             |      KVM
                           pre-patch  post-patch |  pre-patch   post-patch
phys_page_find(i386)          13%         25%    |     0.6%         1%
inl_from_qemu cycles(i386)    153         173    |   ~12000      ~12000
phys_page_find(x86_64)        18%         25%    |     0.8%         1%
inl_from_qemu cycles(x86_64)  163         173    |   ~12000      ~12000

Thus this patch costs 0.4% in the worst case for KVM, 12% in the worst case
for TCG.  The cycle breakdown is:

    60 phys_page_find
    28 access_with_adjusted_size
    24 address_space_translate_internal
    20 address_space_rw
    13 io_mem_read
    11 address_space_translate
     9 memory_region_read_accessor
     6 memory_region_access_valid
     4 helper_inl
     4 memory_access_size
     3 cpu_inl

(This run reported 177 cycles per access; the total is 182 due to rounding).
It is probably possible to shave at least 10 cycles from the functions below,
or to make the depth of the tree dynamic so that you would save even more
compared to 1.6.0.

Also, compiling with "-fstack-protector" instead of "-fstack-protector-all",
as suggested a while ago by rth, is already giving a savings of 20 cycles.

And of course, if this were a realistic test, KVM's 60x penalty would
be a severe problem---but it isn't, because this is not a realistic setting.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]