qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V1 00/32] Live Update


From: Steven Sistare
Subject: Re: [PATCH V1 00/32] Live Update
Date: Fri, 31 Jul 2020 13:20:23 -0400
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0

On 7/31/2020 11:52 AM, Daniel P. Berrangé wrote:
> On Fri, Jul 31, 2020 at 11:27:45AM -0400, Steven Sistare wrote:
>> On 7/31/2020 4:53 AM, Daniel P. Berrangé wrote:
>>> On Thu, Jul 30, 2020 at 02:48:44PM -0400, Steven Sistare wrote:
>>>> On 7/30/2020 12:52 PM, Daniel P. Berrangé wrote:
>>>>> On Thu, Jul 30, 2020 at 08:14:04AM -0700, Steve Sistare wrote:
>>>>>> Improve and extend the qemu functions that save and restore VM state so a
>>>>>> guest may be suspended and resumed with minimal pause time.  qemu may be
>>>>>> updated to a new version in between.
>>>>>>
>>>>>> The first set of patches adds the cprsave and cprload commands to save 
>>>>>> and
>>>>>> restore VM state, and allow the host kernel to be updated and rebooted in
>>>>>> between.  The VM must create guest RAM in a persistent shared memory 
>>>>>> file,
>>>>>> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
>>>>>> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/
>>>>>>
>>>>>> cprsave stops the VCPUs and saves VM device state in a simple file, and
>>>>>> thus supports any type of guest image and block device.  The caller must
>>>>>> not modify the VM's block devices between cprsave and cprload.
>>>>>>
>>>>>> cprsave and cprload support guests with vfio devices if the caller first
>>>>>> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
>>>>>> The guest drivers suspend methods flush outstanding requests and re-
>>>>>> initialize the devices, and thus there is no device state to save and
>>>>>> restore.
>>>>>>
>>>>>>    1 savevm: add vmstate handler iterators
>>>>>>    2 savevm: VM handlers mode mask
>>>>>>    3 savevm: QMP command for cprsave
>>>>>>    4 savevm: HMP Command for cprsave
>>>>>>    5 savevm: QMP command for cprload
>>>>>>    6 savevm: HMP Command for cprload
>>>>>>    7 savevm: QMP command for cprinfo
>>>>>>    8 savevm: HMP command for cprinfo
>>>>>>    9 savevm: prevent cprsave if memory is volatile
>>>>>>   10 kvmclock: restore paused KVM clock
>>>>>>   11 cpu: disable ticks when suspended
>>>>>>   12 vl: pause option
>>>>>>   13 gdbstub: gdb support for suspended state
>>>>>>
>>>>>> The next patches add a restart method that eliminates the persistent 
>>>>>> memory
>>>>>> constraint, and allows qemu to be updated across the restart, but does 
>>>>>> not
>>>>>> allow host reboot.  Anonymous memory segments used by the guest are
>>>>>> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
>>>>>> madvise(MADV_DOEXEC) option in the Linux kernel.  See
>>>>>> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com/
>>>>>>
>>>>>>   14 savevm: VMS_RESTART and cprsave restart
>>>>>>   15 vl: QEMU_START_FREEZE env var
>>>>>>   16 oslib: add qemu_clr_cloexec
>>>>>>   17 util: env var helpers
>>>>>>   18 osdep: import MADV_DOEXEC
>>>>>>   19 memory: ram_block_add cosmetic changes
>>>>>>   20 vl: add helper to request re-exec
>>>>>>   21 exec, memory: exec(3) to restart
>>>>>>   22 char: qio_channel_socket_accept reuse fd
>>>>>>   23 char: save/restore chardev socket fds
>>>>>>   24 ui: save/restore vnc socket fds
>>>>>>   25 char: save/restore chardev pty fds
>>>>>
>>>>> Keeping FDs open across re-exec is a nice trick, but how are you dealing
>>>>> with the state associated with them, most especially the TLS encryption
>>>>> state ? AFAIK, there's no way to serialize/deserialize the TLS state that
>>>>> GNUTLS maintains, and the patches don't show any sign of dealing with
>>>>> this. IOW it looks like while the FD will be preserved, any TLS session
>>>>> running on it will fail.
>>>>
>>>> I had not considered TLS.  If a non-qemu library maintains connection 
>>>> state, then
>>>> we won't be able to support it for live update until the library provides 
>>>> interfaces
>>>> to serialize the state.
>>>>
>>>> For qemu objects, so far vmstate has been adequate to represent the 
>>>> devices with
>>>> descriptors that we preserve.
>>>
>>> My main concern about this series is that there is an implicit assumption
>>> that QEMU is *not* configured with certain features that are not handled
>>> If QEMU is using one of the unsupported features, I don't see anything in
>>> the series which attempts to prevent the actions.
>>>
>>> IOW, users can have an arbitrary QEMU config, attempt to use these new 
>>> features,
>>> the commands may well succeed, but the user is silently left with a broken 
>>> QEMU.
>>> Such silent failure modes are really undesirable as they'll lead to a never
>>> ending stream of hard to diagnose bug reports for QEMU maintainers.
>>>
>>> TLS is one example of this, the live upgrade  will "succeed", but the TLS
>>> connections will be totally non-functional.
>>
>> I agree with all your points and would like to do better in this area.  
>> Other than hunting for 
>> every use of a descriptor and either supporting it or blocking cpr, do you 
>> have any suggestions?
>> Thinking out loud, maybe we can gather all the fds that we support, then 
>> look for all fds in the
>> process, and block the cpr if we find an unrecognized fd.
> 
> There's no magic easy answer to this problem. Conceptually it is similar to
> the problem of reliably migrating guest device state, but in this case we're
> primarily concerned about the backends instead.
> 
> For migration we've got standardized interfaces that devices must implement
> in order to correctly support migration serialization. There is also support
> for devices to register migration "blockers" which prevent any use of the
> migration feature when the device is present.
> 
> We lack this kind of concept for the backend, and that's what I think needs
> to be tackled in a more thorough way.  There are quite alot of backends,
> but they're grouped into a reasonable small number of sets (UIs, chardevs,
> blockdevs, net devs, etc). We need some standard interface that we can
> plumb into all the backends, along with providing backends the ability to
> block the re-exec. If we plumb the generic infrastructure into each of the
> different types of backend, and make the default behaviour be to reject
> the re-exec. Then we need to carefull consider specific  backend impls
> and allow the re-exec only in the very precise cases we can demonstrate
> to be safe.
> 
> IOW, have a presumption that re-exec will *not* be permitted. Over time
> we can make it work for an ever expanding set of use cases. 

Actually, we could use the vmstate mode_mask field added in patch 2, and only 
allow the restart
mode for vmstate objects that have been vetted.  Currently an uninitialized 
mask (value 0)
enables the object for all modes, but we could change that.

- Steve



reply via email to

[Prev in Thread] Current Thread [Next in Thread]