Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64

From:	Ard Biesheuvel
Subject:	Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64
Date:	Wed, 31 Jan 2018 20:15:22 +0000
On 31 January 2018 at 19:12, Christoffer Dall
<address@hidden> wrote:
> On Wed, Jan 31, 2018 at 7:00 PM, Ard Biesheuvel
> <address@hidden> wrote:
>> On 31 January 2018 at 17:39, Christoffer Dall
>> <address@hidden> wrote:
>>> On Wed, Jan 31, 2018 at 5:59 PM, Ard Biesheuvel
>>> <address@hidden> wrote:
>>>> On 31 January 2018 at 16:53, Christoffer Dall
>>>> <address@hidden> wrote:
>>>>> On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel
>>>>> <address@hidden> wrote:
>>>>>> On 31 January 2018 at 09:53, Christoffer Dall
>>>>>> <address@hidden> wrote:
>>>>>>> On Mon, Jan 29, 2018 at 10:32:12AM +0000, Marc Zyngier wrote:
>>>>>>>> On 29/01/18 10:04, Peter Maydell wrote:
>>>>>>>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert <address@hidden> 
>>>>>>>> > wrote:
>>>>>>>> >> * Peter Maydell (address@hidden) wrote:
>>>>>>>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert 
>>>>>>>> >>> <address@hidden> wrote:
>>>>>>>> >>>> * Peter Maydell (address@hidden) wrote:
>>>>>>>> >>>>> I think the correct fix here is that your test code should turn
>>>>>>>> >>>>> its MMU on. Trying to treat guest RAM as uncacheable doesn't work
>>>>>>>> >>>>> for Arm KVM guests (for the same reason that VGA device video 
>>>>>>>> >>>>> memory
>>>>>>>> >>>>> doesn't work). If it's RAM your guest has to arrange to map it as
>>>>>>>> >>>>> Normal Cacheable, and then everything should work fine.
>>>>>>>> >>>>
>>>>>>>> >>>> Does this cause problems with migrating at just the wrong point 
>>>>>>>> >>>> during
>>>>>>>> >>>> a VM boot?
>>>>>>>> >>>
>>>>>>>> >>> It wouldn't surprise me if it did, but I don't think I've ever
>>>>>>>> >>> tried to provoke that problem...
>>>>>>>> >>
>>>>>>>> >> If you think it'll get the RAM contents wrong, it might be best to 
>>>>>>>> >> fail
>>>>>>>> >> the migration if you can detect the cache is disabled in the guest.
>>>>>>>> >
>>>>>>>> > I guess QEMU could look at the value of the "MMU disabled/enabled" 
>>>>>>>> > bit
>>>>>>>> > in the guest's system registers, and refuse migration if it's off...
>>>>>>>> >
>>>>>>>> > (cc'd Marc, Christoffer to check that I don't have the wrong end
>>>>>>>> > of the stick about how thin the ice is in the period before the
>>>>>>>> > guest turns on its MMU...)
>>>>>>>>
>>>>>>>> Once MMU and caches are on, we should be in a reasonable place for QEMU
>>>>>>>> to have a consistent view of the memory. The trick is to prevent the
>>>>>>>> vcpus from changing that. A guest could perfectly turn off its MMU at
>>>>>>>> any given time if it needs to (and it is actually required on some HW 
>>>>>>>> if
>>>>>>>> you want to mitigate headlining CVEs), and KVM won't know about that.
>>>>>>>>
>>>>>>>
>>>>>>> (Clarification: KVM can detect this is it bother to check the VCPU's
>>>>>>> system registers, but we don't trap to KVM when the VCPU turns off its
>>>>>>> caches, right?)
>>>>>>>
>>>>>>>> You may have to pause the vcpus before starting the migration, or
>>>>>>>> introduce a new KVM feature that would automatically pause a vcpu that
>>>>>>>> is trying to disable its MMU while the migration is on. This would
>>>>>>>> involve trapping all the virtual memory related system registers, with
>>>>>>>> an obvious cost. But that cost would be limited to the time it takes to
>>>>>>>> migrate the memory, so maybe that's acceptable.
>>>>>>>>
>>>>>>> Is that even sufficient?
>>>>>>>
>>>>>>> What if the following happened. (1) guest turns off MMU, (2) guest
>>>>>>> writes some data directly to ram (3) qemu stops the vcpu (4) qemu reads
>>>>>>> guest ram.  QEMU's view of guest ram is now incorrect (stale,
>>>>>>> incoherent, ...).
>>>>>>>
>>>>>>> I'm also not really sure if pausing one VCPU because it turned off its
>>>>>>> MMU will go very well when trying to migrate a large VM (wouldn't this
>>>>>>> ask for all the other VCPUs beginning to complain that the stopped VCPU
>>>>>>> appears to be dead?).  As a short-term 'fix' it's probably better to
>>>>>>> refuse migration if you detect that a VCPU had begun turning off its
>>>>>>> MMU.
>>>>>>>
>>>>>>> On the larger scale of thins; this appears to me to be another case of
>>>>>>> us really needing some way to coherently access memory between QEMU and
>>>>>>> the VM, but in the case of the VCPU turning off the MMU prior to
>>>>>>> migration, we don't even know where it may have written data, and I'm
>>>>>>> therefore not really sure what the 'proper' solution would be.
>>>>>>>
>>>>>>> (cc'ing Ard who has has thought about this problem before in the context
>>>>>>> of UEFI and VGA.)
>>>>>>>
>>>>>>
>>>>>> Actually, the VGA case is much simpler because the host is not
>>>>>> expected to write to the framebuffer, only read from it, and the guest
>>>>>> is not expected to create a cacheable mapping for it, so any
>>>>>> incoherency can be trivially solved by cache invalidation on the host
>>>>>> side. (Note that this has nothing to do with DMA coherency, but only
>>>>>> with PCI MMIO BARs that are backed by DRAM in the host)
>>>>>
>>>>> In case of the running guest, the host will also only read from the
>>>>> cached mapping.  Of course, at restore, the host will also write
>>>>> through a cached mapping, but shouldn't the latter case be solvable by
>>>>> having KVM clean the cache lines when faulting in any page?
>>>>>
>>>>
>>>> We are still talking about the contents of the framebuffer, right? In
>>>> that case, yes, afaict
>>>>
>>>
>>> I was talking about normal RAM actually... not sure if that changes 
>>> anything?
>>>
>>
>> The main difference is that with a framebuffer BAR, it is pointless
>> for the guest to map it cacheable, given that the purpose of a
>> framebuffer is its side effects, which are not guaranteed to occur
>> timely if the mapping is cacheable.
>>
>> If we are talking about normal RAM, then why are we discussing it here
>> and not down there?
>>
>
> Because I was trying to figure out how the challenge of accessing the
> VGA framebuffer differs from the challenge of accessing guest RAM
> which may have been written by the guest with the MMU off.
>
> First approximation, they are extremely similar because the guest is
> writing uncached to memory, which the host now has to access via a
> cached mapping.
>
> But I'm guessing that a "clean+invalidate before read on the host"
> solution will result in terrible performance for a framebuffer and
> therefore isn't a good solution for that problem...
>

That highly depends on where 'not working' resides on the performance
scale. Currently, VGA on KVM simply does not work at all, and so
working but slow would be a huge improvement over the current
situation.

Also, the performance hit is caused by the fact that the data needs to
make a round trip to memory, and the invalidation (without cleaning)
performed by the host shouldn't make that much worse than it
fundamentally is to begin with.

A paravirtualized framebuffer (as was proposed recently by Gerd I
think?) would solve this, since the guest can just map it as
cacheable.

>>
>>
>>>>>>
>>>>>> In the migration case, it is much more complicated, and I think
>>>>>> capturing the state of the VM in a way that takes incoherency between
>>>>>> caches and main memory into account is simply infeasible (i.e., the
>>>>>> act of recording the state of guest RAM via a cached mapping may evict
>>>>>> clean cachelines that are out of sync, and so it is impossible to
>>>>>> record both the cached *and* the delta with the uncached state)
>>>>>
>>>>> This may be an incredibly stupid question (and I may have asked it
>>>>> before), but why can't we clean+invalidate the guest page before
>>>>> reading it and thereby obtain a coherent view of a page?
>>>>>
>>>>
>>>> Because cleaning from the host will clobber whatever the guest wrote
>>>> directly to memory with the MMU off, if there is a dirty cacheline
>>>> shadowing that memory.
>>>
>>> If the host never wrote anything to that memory (it shouldn't mess
>>> with the guest's memory) there will only be clean cache lines (even if
>>> they contain content shadowing the memory) and cleaning them would be
>>> equivalent to an invalidate.  Am I misremembering how this works?
>>>
>>
>> Cleaning doesn't actually invalidate, but it should be a no-op for
>> clean cachelines.
>>
>>>> However, that same cacheline could be dirty
>>>> because the guest itself wrote to memory with the MMU on.
>>>
>>> Yes, but the guest would have no control over when such a cache line
>>> gets flushed to main memory by the hardware, and can have no
>>> reasonable expectation that the cache lines don't get cleaned behind
>>> its back.  The fact that a migration triggers this, is reasonable.  A
>>> guest that wants hand-off from main memory that its accessing with the
>>> MMU off, must invalidate the appropriate cache lines or ensure they're
>>> clean.  There's very likely some subtle aspect to all of this that I'm
>>> forgetting.
>>>
>>
>> OK, so if the only way cachelines covering guest memory could be dirty
>> is after the guest wrote to that memory itself via a cacheable
>> mapping, I guess it would be reasonable to do clean+invalidate before
>> reading the memory. Then, the only way for the guest to lose anything
>> is in cases where it could not reasonably expect it to be retained
>> anyway.
>
> Right, that's what I'm thinking.
>
>>
>> However, that does leave a window, between the invalidate and the
>> read, where the guest could modify memory without it being visible to
>> the host.
>
> Is that a problem specific to the coherency challenge?  I thought this
> problem was already addressed by dirty page tracking, but there's like
> some interaction with the cache maintenance that we'd have to figure
> out.
>

I don't know how dirty page tracking works exactly, but if it that can
track direct writes to memory as easily as cached writes, it would
probably cover this as well.
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64, (continued)
Prev by Date: Re: [Qemu-devel] [PATCH v3 19/39] qcow2: Update get_cluster_table() to support L2 slices
Next by Date: [Qemu-devel] [PATCH] tests/migration: Add source to PC boot block
Previous by thread: Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64
Next by thread: [Qemu-devel] [PATCH v3] xilinx_spips: Correct usage of an uninitialized local variable
Index(es):
- Date
- Thread