qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: microvm machine: interrupts not received after initial virtqueue kic


From: Waldek Kozaczuk
Subject: Re: microvm machine: interrupts not received after initial virtqueue kick as expected
Date: Thu, 14 Nov 2019 12:49:59 -0500

Hi,

Thanks for your analysis. It has definitely helped. With the right fixes, OSv can now properly boot on microvm with both block and networking devices.

qemu-system-x86_64 -M microvm,x-option-roms=off,pit=off,pic=off,rtc=off \
   -enable-kvm -cpu host,+x2apic -m 64m -smp 1 \
   -nodefaults -no-user-config -nographic -no-reboot \
   -kernel loader.elf -append "--nopci /hello" \
   -gdb tcp::1234,server,nowait \
   -serial stdio \
   -global virtio-mmio.force-legacy=off \
   -device virtio-blk-device,id=blk0,drive=hd0,scsi=off \
   -drive file=usr.img,if=none,id=hd0,cache=none

OSv v0.54.0-34-g55403bc8
Booted up in 3.56 ms
Cmdline: /hello
Hello from C code

real 0m0.051s
user 0m0.039s
sys 0m0.008s

On average it takes around 4ms to boot OSv with a single block device. On average the same example boots 20% faster than on firecracker which is great!!!

On another hand if I boot OSv image with a disk with ZFS filesystem on it it takes 350-550 ms to boot (mostly spent on mounting the filesystem which apparently requires far more interactions/exists than trivial ROFS). The lower number happens when the image is raw and higher when image is qcow2. 

To compare with firecracker, the exact same OSv raw image boots in ~ 60ms - which is 6-9 times faster than microvm. What could be the culprit? 

On Wed, Nov 13, 2019 at 5:53 AM Sergio Lopez <address@hidden> wrote:

Waldek Kozaczuk <address@hidden> writes:

> Hi,
>
> In the last couple of days, I have been trying to run OSv (
> https://github.com/cloudius-systems/osv) on new microvm machine and I came
> across a problem I need some advice to solve.
>
> In short, when I try to boot OSv with a single block device, it properly
> detects and completes the 8-steps initialization phase per virtio spec.
> However, it gets stuck after the fist virtqueue kick when trying to read
> the partition block from disk. It appears that OSv never receives
> relevant interrupt and ends up waiting forever. At least this is my
> understanding.
>
> Here is the QEMU command line with all parameters and some diagnostic
> output from OSv (I added some extra debug statements to OSv):
>
> qemu-system-x86_64 -M microvm,x-option-roms=off,pit=off,pic=off,rtc=off \
>    -enable-kvm -cpu host,+x2apic -m 64m -smp 1 \
>    -kernel loader.elf -append "--nopci --verbose /hello" \
>    -nodefaults -no-user-config -nographic -no-reboot \
>    -gdb tcp::1234,server,nowait \
>    -serial stdio \
>    -device virtio-blk-device,id=blk0,drive=hd0,scsi=off \
>    -drive file=usr.img,if=none,id=hd0,cache=none
>
> OSv v0.54.0-29-g3357ae07
> CmdLine: --nopci --verbose /hello virtio_mmio.device=512@0xc0000e00:12
> 1 CPUs detected
> Firmware vendor: Unknown
> bsd: initializing - done
> VFS: mounting ramfs at /
> VFS: mounting devfs at /dev
> net: initializing - done
> Detected virtio-mmio device: (2,17745)
> -> virtio_driver: device initialized
> -> virtio_driver: device reset
> -> virtio_driver: device acknowledged
> VIRTIO BLK INSTANCE 0
> -> virtio_driver: device is modern
> blk config: seg_max: 126
> blk config: geometry: 1058013192
> blk config: block size: 512
> blk config: wce: 0
> Queue[0] -> size 1024, paddr d23000
> -> virtio_driver: detected 1 queues
> blk: device setup with interrupt irq: 12!
> !! Kicking queue: 0 at 0xffff8000c0000e50
>
> And also part of the stack trace on OSv side that may help illustrate the
> problem:
> (gdb) osv thread 26
> (gdb) bt
> #0  sched::thread::switch_to (this=this@entry=0xffff8000001a7040) at
> arch/x64/arch-switch.hh:108
> #1  0x00000000403f7164 in sched::cpu::reschedule_from_interrupt
> (this=0xffff80000000a040,
>     called_from_yield=called_from_yield@entry=false, preempt_after=...,
> preempt_after@entry=...) at core/sched.cc:339
> #2  0x00000000403f765c in sched::cpu::schedule () at
> include/osv/sched.hh:1310
> #3  0x00000000403f7d42 in sched::thread::wait
> (this=this@entry=0xffff800000d1c040)
> at core/sched.cc:1214
> #4  0x000000004041fa92 in sched::thread::do_wait_for<lockfree::mutex,
> sched::wait_object<waitqueue> > (mtx=...)
>     at include/osv/mutex.h:41
> #5  sched::thread::wait_for<waitqueue&> (mtx=...) at
> include/osv/sched.hh:1220
> #6  waitqueue::wait (this=this@entry=0xffffa000008f9ea0, mtx=...) at
> core/waitqueue.cc:56
> #7  0x0000000040435ca3 in bio_wait (bio=bio@entry=0xffffa000008f9e00) at
> fs/vfs/kern_physio.cc:42
> #8  0x0000000040436449 in rw_buf (bp=bp@entry=0x408a71c0 <buf_table>,
> rw=rw@entry=0) at fs/vfs/vfs_bio.cc:133
> #9  0x00000000404368a2 in bread (dev=dev@entry=0xffffa00000cba100,
> blkno=blkno@entry=0, bpp=bpp@entry=0x200000100638)
>     at fs/vfs/vfs_bio.cc:236
> #10 0x000000004043e96f in read_partition_table
> (dev=dev@entry=0xffffa00000cba100)
> at fs/devfs/device.cc:110
> #11 0x000000004037c2db in virtio::blk::blk (this=0xffffa000008e5c00,
> virtio_dev=...) at drivers/virtio-blk.cc:174
> #12 0x000000004037c439 in virtio::probe<virtio::blk, (unsigned short)2>
> (dev=<optimized out>) at ./drivers/device.hh:33
> #13 0x00000000403657ef in std::function<hw::hw_driver*
> (hw::hw_device*)>::operator()(hw::hw_device*) const (
>     __args#0=<optimized out>, this=0x200000100820) at
> /usr/include/c++/8/bits/std_function.h:682
> #14 hw::driver_manager::<lambda(hw::hw_device*)>::operator()
> (__closure=0x2000001008c0, dev=0xffffa000008eb980)
>     at drivers/driver.cc:40
> #15 std::_Function_handler<void(hw::hw_device*),
> hw::driver_manager::load_all()::<lambda(hw::hw_device*)> >::_M_invoke(const
> std::_Any_data &, hw::hw_device *&&) (__functor=..., __args#0=<optimized
> out>)
>     at /usr/include/c++/8/bits/std_function.h:297
> #16 0x0000000040362ae3 in std::function<void
> (hw::hw_device*)>::operator()(hw::hw_device*) const (
>     __args#0=<optimized out>, this=0x2000001008c0) at
> /usr/include/c++/8/bits/std_function.h:682
> #17 hw::device_manager::for_each_device(std::function<void
> (hw::hw_device*)>) (this=this@entry=0xffffa000008eb940,
>     func=...) at drivers/device.cc:67
> #18 0x00000000403655a3 in hw::driver_manager::load_all
> (this=this@entry=0xffffa000008eb900)
> at /usr/include/c++/8/new:169
> #19 0x000000004038bffd in arch_init_drivers () at arch/x64/arch-setup.cc:293
> #20 0x000000004022b06b in do_main_thread (_main_args=0xffffa000008c40b2) at
> loader.cc:365
> #21 0x000000004045b7c6 in pthread_private::pthread::<lambda()>::operator()
> (__closure=0xffffa000008f9d00)
>     at libc/pthread.cc:114
> #22 std::_Function_handler<void(), pthread_private::pthread::pthread(void*
> (*)(void*), void*, sigset_t, const
> pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const
> std::_Any_data &) (__functor=...)
>     at /usr/include/c++/8/bits/std_function.h:297
> #23 0x00000000403f8ae7 in sched::thread_main_c (t=0xffff800000d1c040) at
> arch/x64/arch-switch.hh:321
> #24 0x000000004039e7d3 in thread_main () at arch/x64/entry.S:113
> (gdb) osv thread 27
> (gdb) bt
> #0  sched::thread::switch_to (this=0xffff8000001a7040,
> this@entry=0xffff800000d1c040)
> at arch/x64/arch-switch.hh:108
> #1  0x00000000403f7164 in sched::cpu::reschedule_from_interrupt
> (this=0xffff80000000a040,
>     called_from_yield=called_from_yield@entry=false, preempt_after=...,
> preempt_after@entry=...) at core/sched.cc:339
> #2  0x00000000403f765c in sched::cpu::schedule () at
> include/osv/sched.hh:1310
> #3  0x00000000403f7d42 in sched::thread::wait
> (this=this@entry=0xffff800000d2e040)
> at core/sched.cc:1214
> #4  0x000000004036c1c1 in
> sched::thread::do_wait_until<sched::noninterruptible,
> sched::thread::dummy_lock,
> virtio::virtio_driver::wait_for_queue(virtio::vring*, bool
> (virtio::vring::*)() const)::<lambda()> > (mtx=<synthetic pointer>...,
>     pred=...) at drivers/virtio.cc:184
> #5
>  sched::thread::wait_until<virtio::virtio_driver::wait_for_queue(virtio::vring*,
> bool (virtio::vring::*)() const)::<lambda()> > (pred=...) at
> include/osv/sched.hh:1077
> #6  virtio::virtio_driver::wait_for_queue (this=this@entry=0xffffa000008e5c00,
> queue=queue@entry=0xffffa000008f9c00,
>     pred=<optimized out>) at drivers/virtio.cc:168
> #7  0x000000004037b7c5 in virtio::blk::req_done (this=<optimized out>) at
> drivers/virtio-blk.cc:238
> #8  0x000000004037ba6c in virtio::blk::<lambda()>::operator()
> (__closure=<optimized out>) at drivers/virtio-blk.cc:132
> #9  std::_Function_handler<void(),
> virtio::blk::blk(virtio::virtio_device&)::<lambda()> >::_M_invoke(const
> std::_Any_data &) (__functor=...) at
> /usr/include/c++/8/bits/std_function.h:297
> #10 0x00000000403f8ae7 in sched::thread_main_c (t=0xffff800000d2e040) at
> arch/x64/arch-switch.hh:321
> #11 0x000000004039e7d3 in thread_main () at arch/x64/entry.S:113
>
> For comparison, when I run the same image with a single block device on
> firecracker it works fine and the OSv output looks like this:
>
> OSv v0.54.0-29-g3357ae07
> CmdLine: --verbose --nopci /hello virtio_mmio.device=4K@0xd0000000:5
> 1 CPUs detected
> Firmware vendor: Unknown
> bsd: initializing - done
> VFS: mounting ramfs at /
> VFS: mounting devfs at /dev
> net: initializing - done
> Detected virtio-mmio device: (2,0)
> -> virtio_driver: device initialized
> -> virtio_driver: device reset
> -> virtio_driver: device acknowledged
> VIRTIO BLK INSTANCE 0
> setup_features: found feature intersec of bit 9
> setup_features: found feature intersec of bit 41
> -> virtio_driver: device is modern
> Queue[0] -> size 256, paddr d23000
> -> virtio_driver: detected 1 queues
> blk: device setup with interrupt irq: 5!
> !! Kicking queue: 0 at 0xffff8000d0000050
> ------> Acked
> virtio-blk: Add blk device instances 0 as vblk0, devsize=4360192
> random: intel drng, rdrand registered as a source.
> random: <Software, Yarrow> initialized
> VFS: unmounting /dev
> VFS: mounting rofs at /rofs
>     virtio features:  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0
>  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  1  0
>  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1
> !! Kicking queue: 0 at 0xffff8000d0000050
> ------> Acked
> !! Kicking queue: 0 at 0xffff8000d0000050
> ------> Acked
> !! Kicking queue: 0 at 0xffff8000d0000050
> ------> Acked
> VFS: mounting devfs at /dev
> VFS: mounting procfs at /proc
> VFS: mounting sysfs at /sys
> VFS: mounting ramfs at /tmp
> Booted up in 27.11 ms
> Cmdline: /hello
> !! Kicking queue: 0 at 0xffff8000d0000050
> ------> Acked
> !! Kicking queue: 0 at 0xffff8000d0000050
> ------> Acked
> Hello from C code
> VFS: unmounting /dev
> VFS: unmounting /proc
> VFS: unmounting /
> ROFS: spent 6.52 ms reading from disk
> ROFS: read 64 512-byte blocks from disk
> ROFS: allocated 61 512-byte blocks of cache memory
> ROFS: hit ratio is 93.02%
> Powering off.
>
> In the end, it makes me think there is some issue related to interrupts.
> Either wrong ones are used or misconfigured on OSv side or something else
> altogether.
>
> Like on firecracker OSv sets up GSI edge interrupts for QEMU microvm
> virtio-mmio devices but maybe it needs to use GSI level ones? Also, OSv
> works just fine with a traditional QEMU machine and PCI devices.
>
> Lastly, I tried to enable microvm pit and pic options - it did not make any
> difference.
>
> Any suggestions on what might be wrong? What would be the best way to debug
> it on QEMU side?

I gave OSv a quick try (it's been a while since the last time I played
with it), and noticed this:

OSv v0.54.0-26-gcccc7d40
1 CPUs detected
Firmware vendor: Unknown
bsd: initializing - done
VFS: mounting ramfs at /
VFS: mounting devfs at /dev
net: initializing - done
Detected virtio-mmio device: (2,17745)

Thread 1 "qemu-system-x86" hit Breakpoint 1, virtio_queue_notify_aio_vq (vq=0x555556ea4260)
    at /root/qemu/hw/virtio/virtio.c:2232
2232        bool ret = false;
(gdb) n
2234        if (vq->vring.desc && vq->handle_aio_output) {
(gdb)
2245        return ret;
(gdb) p vq->vring.desc
$1 = 0
(gdb) p vq->vring
$2 = {num = 1024, num_default = 128, align = 4096, desc = 0, avail = 0, used = 0, caches = 0x0}

As you can see, QEMU receives the kick, but the VQ wasn't properly
initialized, so it will bail out instead of attempting to process the
request.

To get to this point, I've run QEMU with gdb and put a breakpoint on
virtio_queue_notify_aio_vq (you may need to rebuild with CFLAGS="-O0 -g"
to be able to follow the execution flow easily).
As you hinted the issue was in the configuration/initialization phase of the virtqueues. Let me explain what went wrong in the context of this virtio spec snippet:

"4.2.3.2 Virtqueue Configuration

The driver will typically initialize the virtual queue in the following way:

  1. Select the queue writing its index (first queue is 0) to QueueSel.
  2. Check if the queue is not already in use: read QueueReady, and expect a returned value of zero (0x0).
  3. Read maximum queue size (number of elements) from QueueNumMax. If the returned value is zero (0x0) the queue is not available.
  4. Allocate and zero the queue pages, making sure the memory is physically contiguous. It is recommended to align the Used Ring to an optimal boundary (usually the page size).
  5. Notify the device about the queue size by writing the size to QueueNum.
  6. Write physical addresses of the queue’s Descriptor Table, Available Ring and Used Ring to (respectively) the QueueDescLow/QueueDescHighQueueAvailLow/QueueAvailHigh and QueueUsedLow/QueueUsedHigh register pairs.
  7. Write 0x1 to QueueReady."  
So, in essence, OSv would iterate over all virtqueues until step 3 yields 0. BUT in each iteration, it would ONLY perform steps 1-to-6. After it discovered all queues, OSv would iterate over the identified queues again and execute steps 1, 2 and 7 to activate them. This apparently is tolerated by firecracker and as matter of fact the spec above says "driver will typically initialize" which seems to indicate that it is OK to initialize the virtqueus like OSv does. But apparently QEMU does not like it. So instead I have changed this logic in OSv to adhere to what "driver will typically initialize" sequence (1..7 in each iteration) and now it all works. Firecracker also accepts it. 

In the end, the issue is that OSv doesn't support non-legacy virtio-mmio
devices (despite the check at "drivers/virtio-mmio.cc:121" does allow
for "version == 2" ;-), as described in version 1.0 of the VirtIO spec,
and microvm defaults to that model. You can work around the issue by
adding "-global virtio-mmio.force-legacy=off" to the QEMU command line,
but please consider implementing support for the non-legacy personality
(it's quite straightforward).
 
I think it is the other way - when I start QEMU without ' -global virtio-mmio.force-legacy=off' and add a debug statement to print the virtio version from the mmio configuration register, it shows "1" which indicates the legacy personality. 
On another hand, if I add  ' -global virtio-mmio.force-legacy=off' it prints "2" which indicates the modern personality. 
OSv mmio implementation only handles the modern personality as firecracker does. QEMU microvm seems to be capable of presenting both.
But you are right on the account that OSv should not happily accept both 1 and 2, where in fact it only supports 2. And adding support of the legacy personality can be added to OSv fairly easily but is it worth it given it is legacy one?.
Also relatedly, I know there are parameters that could be added to PCI devices to make them behave as legacy, modern or transitional, for example - 'disable-legacy=off,disable-modern=on". Do the same parameters work with mmio devices with microvm machine? Or  is ' -global virtio-mmio.force-legacy=off' the only way to turn legacy or modern personality?

Sergio.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]