qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for 9.0 08/12] vdpa: add vhost_vdpa_load_setup


From: Eugenio Perez Martin
Subject: Re: [PATCH for 9.0 08/12] vdpa: add vhost_vdpa_load_setup
Date: Thu, 21 Dec 2023 09:20:40 +0100

On Thu, Dec 21, 2023 at 3:17 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Dec 20, 2023 at 3:07 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Wed, Dec 20, 2023 at 6:22 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Sat, Dec 16, 2023 at 1:28 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > >
> > > > Callers can use this function to setup the incoming migration thread.
> > > >
> > > > This thread is able to map the guest memory while the migration is
> > > > ongoing, without blocking QMP or other important tasks. While this
> > > > allows the destination QEMU not to block, it expands the mapping time
> > > > during migration instead of making it pre-migration.
> > >
> > > If it's just QMP, can we simply use bh with a quota here?
> > >
> >
> > Because QEMU cannot guarantee the quota at write(fd,
> > VHOST_IOTLB_UPDATE, ...).
>
> So you mean the delay may be caused by a single syscall?
>

Mostly yes, the iotlb write() that maps of all the guest memory.

> > Also, synchronization with
> > vhost_vdpa_dev_start would complicate as it would need to be
> > re-scheduled too.
>
> Just a flush of the bh, or not?
>

Let me put it differently: to map the guest memory, vhost_vdpa_dma_map
is called because the guest starts the device by a PCI write to the
device status:
#0  vhost_vdpa_dma_map (s=0x5555570e0e60, asid=0, iova=0, size=786432,
vaddr=0x7fff40000000, readonly=false)
    at ../hw/virtio/vhost-vdpa.c:93
#1  0x0000555555979451 in vhost_vdpa_listener_region_add
(listener=0x5555570e0e68, section=0x7fffee5bc0d0) at
../hw/virtio/vhost-vdpa.c:415
#2  0x0000555555b3c543 in listener_add_address_space
(listener=0x5555570e0e68, as=0x555556db72e0 <address_space_memory>)
    at ../system/memory.c:3011
#3  0x0000555555b3c996 in memory_listener_register
(listener=0x5555570e0e68, as=0x555556db72e0 <address_space_memory>)
    at ../system/memory.c:3081
#4  0x000055555597be03 in vhost_vdpa_dev_start (dev=0x5555570e1310,
started=true) at ../hw/virtio/vhost-vdpa.c:1460
#5  0x00005555559734c2 in vhost_dev_start (hdev=0x5555570e1310,
vdev=0x5555584b2c80, vrings=false) at ../hw/virtio/vhost.c:2058
#6  0x0000555555854ec8 in vhost_net_start_one (net=0x5555570e1310,
dev=0x5555584b2c80) at ../hw/net/vhost_net.c:274
#7  0x00005555558554ca in vhost_net_start (dev=0x5555584b2c80,
ncs=0x5555584c8278, data_queue_pairs=1, cvq=1) at
../hw/net/vhost_net.c:415
#8  0x0000555555ace7a5 in virtio_net_vhost_status (n=0x5555584b2c80,
status=15 '\017') at ../hw/net/virtio-net.c:310
#9  0x0000555555acea50 in virtio_net_set_status (vdev=0x5555584b2c80,
status=15 '\017') at ../hw/net/virtio-net.c:391
#10 0x0000555555b06fee in virtio_set_status (vdev=0x5555584b2c80,
val=15 '\017') at ../hw/virtio/virtio.c:2048
#11 0x000055555595d667 in virtio_pci_common_write
(opaque=0x5555584aa8b0, addr=20, val=15, size=1) at
../hw/virtio/virtio-pci.c:1580
#12 0x0000555555b351c1 in memory_region_write_accessor
(mr=0x5555584ab3f0, addr=20, value=0x7fffee5bc4c8, size=1, shift=0,
mask=255,
    attrs=...) at ../system/memory.c:497
#13 0x0000555555b354c5 in access_with_adjusted_size (addr=20,
value=0x7fffee5bc4c8, size=1, access_size_min=1, access_size_max=4,
    access_fn=0x555555b350cf <memory_region_write_accessor>,
mr=0x5555584ab3f0, attrs=...) at ../system/memory.c:573
#14 0x0000555555b3856f in memory_region_dispatch_write
(mr=0x5555584ab3f0, addr=20, data=15, op=MO_8, attrs=...) at
../system/memory.c:1521
#15 0x0000555555b45885 in flatview_write_continue (fv=0x7fffd8122b80,
addr=4227858452, attrs=..., ptr=0x7ffff7ff0028, len=1, addr1=20,
    l=1, mr=0x5555584ab3f0) at ../system/physmem.c:2714
#16 0x0000555555b459e8 in flatview_write (fv=0x7fffd8122b80,
addr=4227858452, attrs=..., buf=0x7ffff7ff0028, len=1)
    at ../system/physmem.c:2756
#17 0x0000555555b45d9a in address_space_write (as=0x555556db72e0
<address_space_memory>, addr=4227858452, attrs=...,
buf=0x7ffff7ff0028,
    len=1) at ../system/physmem.c:2863
#18 0x0000555555b45e07 in address_space_rw (as=0x555556db72e0
<address_space_memory>, addr=4227858452, attrs=...,
buf=0x7ffff7ff0028,
    len=1, is_write=true) at ../system/physmem.c:2873
#19 0x0000555555b5eb30 in kvm_cpu_exec (cpu=0x5555571258f0) at
../accel/kvm/kvm-all.c:2915
#20 0x0000555555b61798 in kvm_vcpu_thread_fn (arg=0x5555571258f0) at
../accel/kvm/kvm-accel-ops.c:51
#21 0x0000555555d384b7 in qemu_thread_start (args=0x55555712c390) at
../util/qemu-thread-posix.c:541
#22 0x00007ffff580814a in start_thread () from /lib64/libpthread.so.0
#23 0x00007ffff54fcf23 in clone () from /lib64/libc.so.6

Can we reschedule that map to a bh without returning the control to the vCPU?

> But another question. How to synchronize with the memory API in this
> case. Currently the updating (without vIOMMU) is done under the
> listener callback.
>
> Usually after the commit, Qemu may think the memory topology has been
> updated. If it is done asynchronously, would we have any problem?
>

The function vhost_vdpa_process_iotlb_msg in the kernel has its own
lock. So two QEMU threads can map memory independently and they get
serialized.

For the write() caller, it is like the call takes more time, but there
are no deadlocks or similar.

> >
> > As a half-baked idea, we can split the mapping chunks in manageable
> > sizes, but I don't like that idea a lot.
> >
> > > Btw, have you measured the hotspot that causes such slowness? Is it
> > > pinning or vendor specific mapping that slows down the progress? Or if
> > > VFIO has a similar issue?
> > >
> >
> > Si-Wei did the actual profiling as he is the one with the 128G guests,
> > but most of the time was spent in the memory pinning. Si-Wei, please
> > correct me if I'm wrong.
> >
> > I didn't check VFIO, but I think it just maps at realize phase with
> > vfio_realize -> vfio_attach_device -> vfio_connect_container(). In
> > previous testings, this delayed the VM initialization by a lot, as
> > we're moving that 20s of blocking to every VM start.
> >
> > Investigating a way to do it only in the case of being the destination
> > of a live migration, I think the right place is .load_setup migration
> > handler. But I'm ok to move it for sure.
>
> Adding Peter for more ideas.
>

Appreciated :).

Thanks!

> >
> > > >
> > > > This thread joins at vdpa backend device start, so it could happen that
> > > > the guest memory is so large that we still have guest memory to map
> > > > before this time.
> > >
> > > So we would still hit the QMP stall in this case?
> > >
> >
> > This paragraph is kind of outdated, sorry. I can only cause this if I
> > don't enable switchover_ack migration capability and if I artificially
> > make memory pinning in the kernel artificially slow. But I didn't
> > check QMP to be honest, so I can try to test it, yes.
> >
> > If QMP is not responsive, that means QMP is not responsive in QEMU
> > master in that period actually. So we're only improving anyway.
> >
> > Thanks!
> >
>
> Thanks
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]