qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] block device fd consuption


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] block device fd consuption
Date: Fri, 11 Jul 2014 11:23:34 +0200
User-agent: Mutt/1.5.23 (2014-03-12)

On Fri, Jul 11, 2014 at 10:56:12AM +0200, Christian Borntraeger wrote:
> Stefan,
> 
> I traced the creation of eventfds with gdb in the case of virtio-blk.

Great, thanks for posting this!

Most of these eventfds are "justified".  They are actively used and are
not leaked.  Avoiding them might be possible with some work but is
likely to make the code messier or notification more expensive (e.g. we
have to scan more request structs to check for completion).

But see the thread pool case below where I think we can eliminate the
eventfd.

> With the following setup
> qemu-system-s390x -enable-kvm -m 1000 -nographic -kernel 
> /boot/vmlinux-3.15.0+ -initrd ramdisk -smp 2 -append "root=/dev/ram0" -M 
> s390-ccw -drive 
> file=/dev/sdc,if=none,id=d0,format=raw,serial=d0,cache=none,aio=native 
> -device virtio-blk-ccw,drive=d0,x-data-plane=on,config-wce=off,scsi=off
> 
> In addition to the file descriptor for the device itself I have the following 
> eventfd:
> 
> 
> Breakpoint 1, event_notifier_init (address@hidden, address@hidden) at 
> /home/cborntra/REPOS/qemu/util/event_notifier-posix.c:29
> #0  event_notifier_init (address@hidden, address@hidden) at 
> /home/cborntra/REPOS/qemu/util/event_notifier-posix.c:29
> #1  0x000000008019e766 in aio_context_new () at 
> /home/cborntra/REPOS/qemu/async.c:274
> #2  0x00000000801ae628 in qemu_init_main_loop () at 
> /home/cborntra/REPOS/qemu/main-loop.c:142
> #3  0x000000008001598c in main (argc=<optimized out>, argv=0x3fffffff2c8, 
> envp=<optimized out>) at /home/cborntra/REPOS/qemu/vl.c:3972
> --> main loop: this is ok and not related to virtio-blk.

Yes.

> Breakpoint 1, event_notifier_init (address@hidden, address@hidden) at 
> /home/cborntra/REPOS/qemu/util/event_notifier-posix.c:29
> #0  event_notifier_init (address@hidden, address@hidden) at 
> /home/cborntra/REPOS/qemu/util/event_notifier-posix.c:29
> #1  0x00000000801ebb58 in laio_init () at 
> /home/cborntra/REPOS/qemu/block/linux-aio.c:289
> #2  0x00000000801ea17a in raw_set_aio (aio_ctx=0x807fa0a8, 
> use_aio=0x807fa0a0, bdrv_flags=<optimized out>) at 
> /home/cborntra/REPOS/qemu/block/raw-posix.c:351
> #3  0x00000000801ea2a2 in raw_open_common (address@hidden, address@hidden, 
> address@hidden, address@hidden, address@hidden)
>     at /home/cborntra/REPOS/qemu/block/raw-posix.c:433
> #4  0x00000000801ea6b4 in hdev_open (bs=0x807fd1a0, options=0x807fdce0, 
> flags=<optimized out>, errp=0x3ffffffe830) at 
> /home/cborntra/REPOS/qemu/block/raw-posix.c:1760
> #5  0x00000000801aba9e in bdrv_open_common (errp=0x3ffffffe818, 
> drv=0x80316088 <bdrv_host_device>, flags=57570, options=0x807fdce0, file=0x0, 
> bs=0x807fd1a0) at /home/cborntra/REPOS/qemu/block.c:967
> #6  bdrv_open (address@hidden, filename=<optimized out>, address@hidden 
> "/dev/disk/by-id/scsi-36005076305ffc1ae", '0' <repeats 12 times>, "2580", 
> address@hidden, 
>     options=0x807fdce0, flags=57570, drv=0x80316088 <bdrv_host_device>, 
> errp=0x3ffffffe9e8) at /home/cborntra/REPOS/qemu/block.c:1472
> #7  0x00000000801ac460 in bdrv_open_image (address@hidden, address@hidden 
> "/dev/disk/by-id/scsi-36005076305ffc1ae", '0' <repeats 12 times>, "2580", 
>     address@hidden, address@hidden "file", address@hidden, allow_none=true, 
> errp=0x3ffffffe9e8) at /home/cborntra/REPOS/qemu/block.c:1274
> #8  0x00000000801ab74a in bdrv_open (address@hidden, address@hidden 
> "/dev/disk/by-id/scsi-36005076305ffc1ae", '0' <repeats 12 times>, "2580", 
> address@hidden
>     0x0, options=0x807fb160, address@hidden, flags=8418, address@hidden, 
> drv=0x80312908 <bdrv_raw>, errp=0x3ffffffead8) at 
> /home/cborntra/REPOS/qemu/block.c:1451
> #9  0x00000000800ba11e in blockdev_init (address@hidden 
> "/dev/disk/by-id/scsi-36005076305ffc1ae", '0' <repeats 12 times>, "2580", 
> address@hidden, address@hidden
>     0x3ffffffec58) at /home/cborntra/REPOS/qemu/blockdev.c:523
> #10 0x00000000800bb530 in drive_new (all_opts=0x807e7cf0, 
> block_default_type=<optimized out>) at 
> /home/cborntra/REPOS/qemu/blockdev.c:930
> #11 0x00000000800d11d4 in drive_init_func (opts=<optimized out>, 
> opaque=<optimized out>) at /home/cborntra/REPOS/qemu/vl.c:1144
> #12 0x00000000802110b0 in qemu_opts_foreach (list=<optimized out>, 
> address@hidden <drive_init_func>, address@hidden, address@hidden)
>     at /home/cborntra/REPOS/qemu/util/qemu-option.c:1072
> #13 0x0000000080016438 in main (argc=<optimized out>, argv=<optimized out>, 
> envp=<optimized out>) at /home/cborntra/REPOS/qemu/vl.c:4352
> --> No idea

Ah, I forgot about this one.  This is the Linux AIO completion eventfd.

It gets signalled when a Linux AIO request completes and we need to call
io_getevents(2).

You can avoid it by using aio=threads instead of aio=native.  But then
you cannot use Linux AIO.  I am not aware of a good way around using
this fd.

> Breakpoint 1, event_notifier_init (address@hidden, address@hidden) at 
> /home/cborntra/REPOS/qemu/util/event_notifier-posix.c:29
> #0  event_notifier_init (address@hidden, address@hidden) at 
> /home/cborntra/REPOS/qemu/util/event_notifier-posix.c:29
> #1  0x000000008019f04c in thread_pool_init_one (ctx=0x807e8e00, 
> pool=0x807f4eb0) at /home/cborntra/REPOS/qemu/thread-pool.c:296
> #2  thread_pool_new (ctx=<optimized out>) at 
> /home/cborntra/REPOS/qemu/thread-pool.c:314
> #3  0x000000008019e590 in aio_get_thread_pool (ctx=0x807e8e00) at 
> /home/cborntra/REPOS/qemu/async.c:245
> #4  0x00000000801e8f6c in paio_submit (address@hidden, fd=<optimized out>, 
> address@hidden, address@hidden, address@hidden, cb=
>     0x801a1a4c <bdrv_co_io_em_complete>, opaque=0x3ffba3fe8d0, type=4097) at 
> /home/cborntra/REPOS/qemu/block/raw-posix.c:1027
> #5  0x00000000801e9b5c in raw_aio_submit (bs=0x807fd1a0, sector_num=0, 
> qiov=0x3ffffffda98, nb_sectors=<optimized out>, address@hidden 
> <bdrv_co_io_em_complete>, opaque=0x3ffba3fe8d0, type=4097)
>     at /home/cborntra/REPOS/qemu/block/raw-posix.c:1056
> #6  0x00000000801e9c84 in raw_aio_readv (bs=<optimized out>, 
> sector_num=<optimized out>, qiov=<optimized out>, nb_sectors=<optimized out>, 
> cb=0x801a1a4c <bdrv_co_io_em_complete>, opaque=0x3ffba3fe8d0)
>     at /home/cborntra/REPOS/qemu/block/raw-posix.c:1094
> #7  0x00000000801a2594 in bdrv_co_io_em (is_write=false, iov=0x3ffffffda98, 
> nb_sectors=<optimized out>, sector_num=0, bs=0x807fd1a0) at 
> /home/cborntra/REPOS/qemu/block.c:4835
> #8  bdrv_co_readv_em (address@hidden, address@hidden, nb_sectors=<optimized 
> out>, address@hidden) at /home/cborntra/REPOS/qemu/block.c:4852
> #9  0x00000000801a7d7c in bdrv_aligned_preadv (address@hidden, 
> address@hidden, offset=<optimized out>, address@hidden, align=<optimized 
> out>, qiov=0x3ffffffda98, flags=0)
>     at /home/cborntra/REPOS/qemu/block.c:3057
> #10 0x00000000801a82da in bdrv_co_do_preadv (bs=0x807fd1a0, offset=<optimized 
> out>, bytes=512, qiov=0x3ffffffda98, flags=<optimized out>, 
> address@hidden(unknown: 0))
>     at /home/cborntra/REPOS/qemu/block.c:3136
> #11 0x00000000801a83e4 in bdrv_co_do_readv (flags=(unknown: 0), 
> qiov=<optimized out>, nb_sectors=<optimized out>, sector_num=<optimized out>, 
> bs=<optimized out>)
>     at /home/cborntra/REPOS/qemu/block.c:3158
> #12 bdrv_co_readv (bs=<optimized out>, sector_num=<optimized out>, 
> nb_sectors=<optimized out>, qiov=<optimized out>) at 
> /home/cborntra/REPOS/qemu/block.c:3167
> #13 0x00000000801a7ce2 in bdrv_aligned_preadv (address@hidden, 
> address@hidden, offset=<optimized out>, address@hidden, align=512, 
> qiov=0x3ffffffda98, flags=0)
>     at /home/cborntra/REPOS/qemu/block.c:3042
> #14 0x00000000801a82da in bdrv_co_do_preadv (bs=0x807fa620, offset=<optimized 
> out>, bytes=512, qiov=0x3ffffffda98, flags=<optimized out>) at 
> /home/cborntra/REPOS/qemu/block.c:3136
> #15 0x00000000801a94d8 in bdrv_rw_co_entry (opaque=0x3ffffffd9b8) at 
> /home/cborntra/REPOS/qemu/block.c:2693
> #16 bdrv_rw_co_entry (opaque=0x3ffffffd9b8) at 
> /home/cborntra/REPOS/qemu/block.c:2688
> #17 0x00000000801ba140 in coroutine_trampoline (i0=<optimized out>, i1=<error 
> reading variable: value has been optimized out>) at 
> /home/cborntra/REPOS/qemu/coroutine-ucontext.c:118
> #18 0x000003fffc935892 in __makecontext_ret () from /lib64/libc.so.6
> --> No idea

Similar deal to the Linux AIO event notifier.  It's the fd used to
signal thread pool work item completion.  The threadpool is
per-AioContext so the fd overhead is per-iothread.

However, we can use a BH instead since the API has now been made
thread-safe.  Previously we used EventNotifier because
qemu_bh_schedule() was not thread-safe.

I will send a patch but I'm not sure it's critical enough for QEMU 2.1.
Do you have a bug report or justification for pushing this into QEMU
2.1?

> Breakpoint 1, event_notifier_init (address@hidden, address@hidden) at 
> /home/cborntra/REPOS/qemu/util/event_notifier-posix.c:29
> #0  event_notifier_init (address@hidden, address@hidden) at 
> /home/cborntra/REPOS/qemu/util/event_notifier-posix.c:29
> #1  0x000000008019e766 in aio_context_new () at 
> /home/cborntra/REPOS/qemu/async.c:274
> #2  0x00000000800be410 in iothread_complete (obj=<optimized out>, 
> errp=<optimized out>) at /home/cborntra/REPOS/qemu/iothread.c:69
> #3  0x000000008006d3bc in virtio_blk_data_plane_create (address@hidden, 
> address@hidden, address@hidden, address@hidden)
>     at /home/cborntra/REPOS/qemu/hw/block/dataplane/virtio-blk.c:172
> #4  0x000000008006ba22 in virtio_blk_device_realize (dev=0x807f29e0, 
> errp=0x3ffffffe198) at /home/cborntra/REPOS/qemu/hw/block/virtio-blk.c:751
> #5  0x0000000080077ee2 in virtio_device_realize (dev=0x807f29e0, 
> errp=0x3ffffffe250) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1294
> #6  0x0000000080131760 in device_set_realized (obj=0x807f29e0, 
> value=<optimized out>, errp=0x3ffffffe5b0) at 
> /home/cborntra/REPOS/qemu/hw/core/qdev.c:834
> #7  0x0000000080162dca in property_set_bool (address@hidden, address@hidden, 
> opaque=0x807f31f0, address@hidden "realized", address@hidden)
>     at /home/cborntra/REPOS/qemu/qom/object.c:1473
> #8  0x0000000080164bd6 in object_property_set (address@hidden, v=0x808a3a90, 
> address@hidden "realized", address@hidden) at 
> /home/cborntra/REPOS/qemu/qom/object.c:824
> #9  0x0000000080166a3a in object_property_set_qobject (obj=0x807f29e0, 
> value=<optimized out>, name=0x8025bdd2 "realized", errp=0x3ffffffe5b0) at 
> /home/cborntra/REPOS/qemu/qom/qom-qobject.c:24
> #10 0x0000000080164efc in object_property_set_bool (address@hidden, 
> address@hidden, address@hidden "realized", address@hidden)
>     at /home/cborntra/REPOS/qemu/qom/object.c:888
> #11 0x0000000080130312 in qdev_init (address@hidden) at 
> /home/cborntra/REPOS/qemu/hw/core/qdev.c:168
> #12 0x00000000800893bc in virtio_ccw_blk_init (ccw_dev=0x807f2780) at 
> /home/cborntra/REPOS/qemu/hw/s390x/virtio-ccw.c:804
> #13 0x000000008008a6d4 in virtio_ccw_busdev_init (dev=0x807f2780) at 
> /home/cborntra/REPOS/qemu/hw/s390x/virtio-ccw.c:1582
> #14 0x000000008012fb2a in device_realize (dev=0x807f2780, errp=0x3ffffffe870) 
> at /home/cborntra/REPOS/qemu/hw/core/qdev.c:183
> #15 0x0000000080131760 in device_set_realized (obj=0x807f2780, 
> value=<optimized out>, errp=0x3ffffffebe8) at 
> /home/cborntra/REPOS/qemu/hw/core/qdev.c:834
> #16 0x0000000080162dca in property_set_bool (address@hidden, address@hidden, 
> opaque=0x807f2bd0, address@hidden "realized", address@hidden)
>     at /home/cborntra/REPOS/qemu/qom/object.c:1473
> #17 0x0000000080164bd6 in object_property_set (address@hidden, v=0x8089d9c0, 
> address@hidden "realized", address@hidden) at 
> /home/cborntra/REPOS/qemu/qom/object.c:824
> #18 0x0000000080166a3a in object_property_set_qobject (obj=0x807f2780, 
> value=<optimized out>, name=0x8025bdd2 "realized", errp=0x3ffffffebe8) at 
> /home/cborntra/REPOS/qemu/qom/qom-qobject.c:24
> #19 0x0000000080164efc in object_property_set_bool (address@hidden, 
> address@hidden, address@hidden "realized", address@hidden)
>     at /home/cborntra/REPOS/qemu/qom/object.c:888
> #20 0x00000000800bf622 in qdev_device_add (opts=0x807e8080) at 
> /home/cborntra/REPOS/qemu/qdev-monitor.c:560
> #21 0x00000000800d1616 in device_init_func (opts=<optimized out>, 
> opaque=<optimized out>) at /home/cborntra/REPOS/qemu/vl.c:2357
> #22 0x00000000802110b0 in qemu_opts_foreach (list=<optimized out>, 
> address@hidden <device_init_func>, address@hidden, address@hidden)
>     at /home/cborntra/REPOS/qemu/util/qemu-option.c:1072
> #23 0x000000008001673a in main (argc=<optimized out>, argv=<optimized out>, 
> envp=<optimized out>) at /home/cborntra/REPOS/qemu/vl.c:4431
> 
> --> data plane create, eventfd for iothread. I guess we need one per iothread?

Yes, this is the per-iothread aio_notify() event notifier.

> System now boots, and then the first disk access (partition detection):
> 
> Breakpoint 1, event_notifier_init (address@hidden, address@hidden) at 
> /home/cborntra/REPOS/qemu/util/event_notifier-posix.c:29
> #0  event_notifier_init (address@hidden, address@hidden) at 
> /home/cborntra/REPOS/qemu/util/event_notifier-posix.c:29
> #1  0x000000008008839a in virtio_ccw_set_guest_notifier (address@hidden, 
> address@hidden, address@hidden, address@hidden)
>     at /home/cborntra/REPOS/qemu/hw/s390x/virtio-ccw.c:1201
> #2  0x000000008008a7bc in virtio_ccw_set_guest_notifiers (d=<optimized out>, 
> nvqs=<optimized out>, assigned=<optimized out>) at 
> /home/cborntra/REPOS/qemu/hw/s390x/virtio-ccw.c:1259
> #3  0x000000008006d4fa in virtio_blk_data_plane_start (s=0x808b7e10) at 
> /home/cborntra/REPOS/qemu/hw/block/dataplane/virtio-blk.c:222
> #4  0x000000008006cf10 in virtio_blk_handle_output (vdev=<optimized out>, 
> vq=<optimized out>) at /home/cborntra/REPOS/qemu/hw/block/virtio-blk.c:427
> #5  0x0000000080086da0 in virtio_ccw_hcall_notify (args=<optimized out>) at 
> /home/cborntra/REPOS/qemu/hw/s390x/s390-virtio-ccw.c:60
> #6  0x0000000080082084 in s390_virtio_hypercall (address@hidden) at 
> /home/cborntra/REPOS/qemu/hw/s390x/s390-virtio-hcall.c:34
> #7  0x00000000800b83a4 in handle_hypercall (run=<optimized out>, 
> cpu=0x8087d100) at /home/cborntra/REPOS/qemu/target-s390x/kvm.c:911
> #8  handle_diag (ipb=<optimized out>, run=<optimized out>, cpu=0x8087d100) at 
> /home/cborntra/REPOS/qemu/target-s390x/kvm.c:963
> #9  handle_instruction (run=<optimized out>, cpu=0x8087d100) at 
> /home/cborntra/REPOS/qemu/target-s390x/kvm.c:1091
> #10 handle_intercept (cpu=0x8087d100) at 
> /home/cborntra/REPOS/qemu/target-s390x/kvm.c:1141
> #11 kvm_arch_handle_exit (address@hidden, address@hidden) at 
> /home/cborntra/REPOS/qemu/target-s390x/kvm.c:1259
> #12 0x000000008005b4d6 in kvm_cpu_exec (address@hidden) at 
> /home/cborntra/REPOS/qemu/kvm-all.c:1792
> #13 0x000000008004733e in qemu_kvm_cpu_thread_fn (arg=0x8087d100) at 
> /home/cborntra/REPOS/qemu/cpus.c:874
> #14 0x000003fffdd6b412 in start_thread () from /lib64/libpthread.so.0
> #15 0x000003fffc9f10ae in thread_start () from /lib64/libc.so.6
> ---> irqfd, ok

Yes.

> Breakpoint 1, event_notifier_init (address@hidden, address@hidden) at 
> /home/cborntra/REPOS/qemu/util/event_notifier-posix.c:29
> #0  event_notifier_init (address@hidden, address@hidden) at 
> /home/cborntra/REPOS/qemu/util/event_notifier-posix.c:29
> #1  0x000000008008785a in virtio_ccw_set_guest2host_notifier (dev=<optimized 
> out>, n=<optimized out>, assign=<optimized out>, set_handler=<optimized out>)
>     at /home/cborntra/REPOS/qemu/hw/s390x/virtio-ccw.c:141
> #2  0x000000008006d526 in virtio_blk_data_plane_start (s=0x808b7e10) at 
> /home/cborntra/REPOS/qemu/hw/block/dataplane/virtio-blk.c:230
> #3  0x000000008006cf10 in virtio_blk_handle_output (vdev=<optimized out>, 
> vq=<optimized out>) at /home/cborntra/REPOS/qemu/hw/block/virtio-blk.c:427
> #4  0x0000000080086da0 in virtio_ccw_hcall_notify (args=<optimized out>) at 
> /home/cborntra/REPOS/qemu/hw/s390x/s390-virtio-ccw.c:60
> #5  0x0000000080082084 in s390_virtio_hypercall (address@hidden) at 
> /home/cborntra/REPOS/qemu/hw/s390x/s390-virtio-hcall.c:34
> #6  0x00000000800b83a4 in handle_hypercall (run=<optimized out>, 
> cpu=0x8087d100) at /home/cborntra/REPOS/qemu/target-s390x/kvm.c:911
> #7  handle_diag (ipb=<optimized out>, run=<optimized out>, cpu=0x8087d100) at 
> /home/cborntra/REPOS/qemu/target-s390x/kvm.c:963
> #8  handle_instruction (run=<optimized out>, cpu=0x8087d100) at 
> /home/cborntra/REPOS/qemu/target-s390x/kvm.c:1091
> #9  handle_intercept (cpu=0x8087d100) at 
> /home/cborntra/REPOS/qemu/target-s390x/kvm.c:1141
> #10 kvm_arch_handle_exit (address@hidden, address@hidden) at 
> /home/cborntra/REPOS/qemu/target-s390x/kvm.c:1259
> #11 0x000000008005b4d6 in kvm_cpu_exec (address@hidden) at 
> /home/cborntra/REPOS/qemu/kvm-all.c:1792
> #12 0x000000008004733e in qemu_kvm_cpu_thread_fn (arg=0x8087d100) at 
> /home/cborntra/REPOS/qemu/cpus.c:874
> #13 0x000003fffdd6b412 in start_thread () from /lib64/libpthread.so.0
> #14 0x000003fffc9f10ae in thread_start () from /lib64/libc.so.6
> ---> ioeventfd, ok

Yes.

Stefan

Attachment: pgpfd1WcRa41F.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]