[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] memory: synchronize dirty bitmap before unmappi
From: |
Jan Kiszka |
Subject: |
Re: [Qemu-devel] [PATCH] memory: synchronize dirty bitmap before unmapping a range |
Date: |
Mon, 01 Aug 2011 12:21:31 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666 |
On 2011-08-01 11:45, Avi Kivity wrote:
> On 08/01/2011 12:05 PM, Jan Kiszka wrote:
>> On 2011-08-01 10:16, Avi Kivity wrote:
>>> On 08/01/2011 10:52 AM, Jan Kiszka wrote:
>>>> On 2011-08-01 09:34, Jan Kiszka wrote:
>>>> > On 2011-07-31 21:47, Avi Kivity wrote:
>>>> >> When a range is being unmapped, ask accelerators (e.g. kvm) to
>>>> synchronize the
>>>> >> dirty bitmap to avoid losing information forever.
>>>> >>
>>>> >> Fixes grub2 screen update.
>>>> >
>>>> > I does.
>>>> >
>>>> > But something is still broken. As I reported before, the
>>>> performance of
>>>> > grub2 startup is an order of magnitude slower than with the existing
>>>> > code. According to ftrace, we are getting tons of additional
>>>> > EPT_MISCONFIG exits over the 0xA0000 segment. But I haven't spot the
>>>> > difference yet. The effective slot setup as communicated to kvm looks
>>>> > innocent.
>>>>
>>>> I take it back: We obviously once in a while resume the guest with the
>>>> vga segment unmapped. And that, of course, ends up doing mmio instead of
>>>> plain ram accesses.
>>>>
>>>
>>> qemu-kvm.git 6b5956c573 and its predecessor fix the issue (and I think
>>> they're even faster than upstream, but perhaps I'm not objective).
>>>
>>
>> Just updated to the latest memory-region branch - how did you test it?
>> It does not link here due to forgotten rwhandler in Makefile.target.
>>
>> Anyway, that commit has no impact on the issue I'm seeing. I'm also
>> carrying transaction changes for cirrus here, but they have no
>> noticeable impact. That indicates that the new API is not actually slow,
>> it likely just has some bug.
>
> Here's the log of range changes while in grub2:
>
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 20000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
I saw this as well and thought it should be fine. But it does not tell
you what is currently active when the guest runs.
>
> Note that drop/add is always paired (i.e. the guest never sees an
> unmapped area), and we always map the full 64k even though cirrus code
> manages each 32k bank individually. It looks optimal... we're probably
> not testing the same thing (either qemu or guest code).
This is what my instrumentation revealed:
map_linear_vram_bank 0
map 0 (actually perform the mapping)
map_linear_vram_bank 1
map 1
4 a0000 0 7fe863a62000 1 (KVM_SET_USER_MEMORY_REGION)
4 a0000 10000 7fe863a72000 1
run (enter guest)
map_linear_vram_bank 0
map 0
map_linear_vram_bank 1
map 1
4 a0000 0 7fe863a72000 1
4 a0000 10000 7fe863a62000 1
run
map_linear_vram_bank 0
map 0
map_linear_vram_bank 1
map 1
4 a0000 0 7fe863a62000 1
run
map_linear_vram_bank 0
map 0
map_linear_vram_bank 1
map 1
run
So we suddenly get out of sync and enter the guest with an unmapped vram
segment. I takes a long time (in number of map changes) until the region
becomes mapped again.
Jan
--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux