qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re: Re: [PATCH 0/1] introduce nvmf block driver


From: Stefan Hajnoczi
Subject: Re: Re: Re: [PATCH 0/1] introduce nvmf block driver
Date: Tue, 8 Jun 2021 13:59:23 +0100

On Tue, Jun 08, 2021 at 08:19:20PM +0800, zhenwei pi wrote:
> On 6/8/21 4:07 PM, Stefan Hajnoczi wrote:
> > On Tue, Jun 08, 2021 at 10:52:05AM +0800, zhenwei pi wrote:
> > > On 6/7/21 11:08 PM, Stefan Hajnoczi wrote:
> > > > On Mon, Jun 07, 2021 at 09:32:52PM +0800, zhenwei pi wrote:
> > > > > Since 2020, I started to develop a userspace NVMF initiator library:
> > > > > https://github.com/bytedance/libnvmf
> > > > > and released v0.1 recently.
> > > > > 
> > > > > Also developed block driver for QEMU side:
> > > > > https://github.com/pizhenwei/qemu/tree/block-nvmf
> > > > > 
> > > > > Test with linux kernel NVMF target (TCP), QEMU gets about 220K IOPS,
> > > > > it seems good.
> > > > 
> > > > How does the performance compare to the Linux kernel NVMeoF initiator?
> > > > 
> > > > In case you're interested, some Red Hat developers have started to
> > > > working on a new library called libblkio. For now it supports io_uring
> > > > but PCI NVMe and virtio-blk are on the roadmap. The library supports
> > > > blocking, event-driven, and polling modes. There isn't a direct overlap
> > > > with libnvmf but maybe they can learn from each other.
> > > > https://gitlab.com/libblkio/libblkio/-/blob/main/docs/blkio.rst
> > > > 
> > > > Stefan
> > > > 
> > > 
> > > I'm sorry about that no enough information of QEMU block nvmf driver and
> > > libnvmf.
> > > 
> > > Kernel initiator & userspace initiator
> > > Rather than io_uring/libaio + kernel initiator solution(read 500K+ IOPS &
> > > write 200K+ IOPS), I prefer QEMU block nvmf + libnvmf(RW 200K+ IOPS):
> > > 1, I don't have to upgrade host kernel. I can also run it on a lower 
> > > version
> > > of kernel.
> > > 2, During re-connection if target side hits a panic, initiator side would
> > > not get 'D' state(uninterruptable state in kernel), QEMU always could be
> > > killed.
> > > 3, It's easier to trouble shoot for a userspace application.
> > 
> > I see, thanks for sharing.
> > 
> > > Default NVMe-OF IO queues
> > > The mechanism of QEMU+libnvmf:
> > > 1, QEMU iothread creates a request and dispatches it to NVMe-OF IO queues
> > > thread by lockless list.
> > > 2, QEMU iothread tries to kick NVMe-OF IO queue thread.
> > > 3, NVMe-OF IO queue thread processes request and returns response to the
> > > QEMU iothread.
> > > 
> > > When the QEMU iothread reaches the limitation, 4 NVMe-OF IO queues get
> > > better performance.
> > 
> > Can you explain this bottleneck? Even with 4 NVMe-oF IO queues there is
> > still just 1 IOThread submitting requests, so why are 4 IO queues faster
> > than 1?
> > 
> > Stefan
> > 
> 
> QEMU + libiscsi solution uses iothread send/recv TCP and processes iSCSI
> PDU directly, it could get about 60K IOPS. Let's look at the perf report of
> the iothread:
> +   35.06%      [k] entry_SYSCALL_64_after_hwframe
> +   33.13%      [k] do_syscall_64
> +   19.70%      [.] 0x0000000100000000
> +   18.31%      [.] __libc_send
> +   18.02%      [.] iscsi_tcp_service
> +   16.30%      [k] __x64_sys_sendto
> +   16.24%      [k] __sys_sendto
> +   15.69%      [k] sock_sendmsg
> +   15.56%      [k] tcp_sendmsg
> +   14.25%      [k] __tcp_transmit_skb
> +   13.94%      [k] 0x0000000000001000
> +   13.78%      [k] tcp_sendmsg_locked
> +   13.67%      [k] __ip_queue_xmit
> +   13.00%      [k] tcp_write_xmit
> +   12.07%      [k] __tcp_push_pending_frames
> +   11.91%      [k] inet_recvmsg
> +   11.78%      [k] tcp_recvmsg
> +   11.73%      [k] ip_output
> 
> The bottleneck of this case is TCP, so libnvmf dispatches request to other
> threads by lockless list to reduce the overhead of TCP. It gets more
> effective to process requests from guest.

Are IOThread %usr and %sys CPU utilization close to 100%?

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]