[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 0/2] exec: alternative fix for master abort woes
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [PATCH 0/2] exec: alternative fix for master abort woes |
Date: |
Thu, 07 Nov 2013 20:12:27 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130923 Thunderbird/17.0.9 |
Il 07/11/2013 19:54, Michael S. Tsirkin ha scritto:
> On Thu, Nov 07, 2013 at 06:29:40PM +0100, Paolo Bonzini wrote:
>> Il 07/11/2013 17:47, Michael S. Tsirkin ha scritto:
>>> That's on kvm with 52 bit address.
>>> But where I would be concerned is systems with e.g. 36 bit address
>>> space where we are doubling the cost of the lookup.
>>> E.g. try i386 and not x86_64.
>>
>> Tried now...
>>
>> P_L2_LEVELS pre-patch post-patch
>> i386 3 6
>> x86_64 4 6
>>
>> I timed the inl_from_qemu test of vmexit.flat with both KVM and TCG. With
>> TCG there's indeed a visible penalty of 20 cycles for i386 and 10 for x86_64
>> (you can extrapolate to 30 cycles for TARGET_PHYS_ADDR_SPACE_BITS=32
>> targets).
>> These can be more or less entirely ascribed to phys_page_find:
>>
>> TCG | KVM
>> pre-patch post-patch | pre-patch post-patch
>> phys_page_find(i386) 13% 25% | 0.6% 1%
>> inl_from_qemu cycles(i386) 153 173 | ~12000 ~12000
>
> I'm a bit confused by the numbers above. The % of phys_page_find has
> grown from 13% to 25% (almost double, which is kind of expected
> give we have twice the # of levels).
Yes.
> But overhead in # of cycles only went from 153 to
> 173?
new cycles / old cycles = 173 / 153 = 113%
% outside phys_page_find + % in phys_page_find*2 = 87% + 13%*2 = 113%
> Maybe the test is a bit wrong for tcg - how about unrolling the
> loop in kvm unit test?
Done that already. :)
>> Also, compiling with "-fstack-protector" instead of "-fstack-protector-all",
>> as suggested a while ago by rth, is already giving a savings of 20 cycles.
>
> Is it true that with TCG this affects more than just MMIO
> as phys_page_find will also sometimes run on CPU accesses to memory?
Yes. I tried benchmarking with perf the boot of a RHEL guest, which has
TCG | KVM
pre-patch post-patch | pre-patch post-patch
3% 5.8% | 0.9% 1.7%
This is actually higher than usual for KVM because there are many VGA
access during GRUB.
>> And of course, if this were a realistic test, KVM's 60x penalty would
>> be a severe problem---but it isn't, because this is not a realistic setting.
>
> Well, for this argument to carry the day we'd need to design
> a realistic test which isn't easy :)
Yes, I guess the number that matters is the extra 2% penalty for TCG
(the part that doesn't come from MMIO).
Paolo