[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Re: Re: [PATCH 0/1] introduce nvmf block driver
From: |
Stefan Hajnoczi |
Subject: |
Re: Re: Re: [PATCH 0/1] introduce nvmf block driver |
Date: |
Tue, 8 Jun 2021 13:59:23 +0100 |
On Tue, Jun 08, 2021 at 08:19:20PM +0800, zhenwei pi wrote:
> On 6/8/21 4:07 PM, Stefan Hajnoczi wrote:
> > On Tue, Jun 08, 2021 at 10:52:05AM +0800, zhenwei pi wrote:
> > > On 6/7/21 11:08 PM, Stefan Hajnoczi wrote:
> > > > On Mon, Jun 07, 2021 at 09:32:52PM +0800, zhenwei pi wrote:
> > > > > Since 2020, I started to develop a userspace NVMF initiator library:
> > > > > https://github.com/bytedance/libnvmf
> > > > > and released v0.1 recently.
> > > > >
> > > > > Also developed block driver for QEMU side:
> > > > > https://github.com/pizhenwei/qemu/tree/block-nvmf
> > > > >
> > > > > Test with linux kernel NVMF target (TCP), QEMU gets about 220K IOPS,
> > > > > it seems good.
> > > >
> > > > How does the performance compare to the Linux kernel NVMeoF initiator?
> > > >
> > > > In case you're interested, some Red Hat developers have started to
> > > > working on a new library called libblkio. For now it supports io_uring
> > > > but PCI NVMe and virtio-blk are on the roadmap. The library supports
> > > > blocking, event-driven, and polling modes. There isn't a direct overlap
> > > > with libnvmf but maybe they can learn from each other.
> > > > https://gitlab.com/libblkio/libblkio/-/blob/main/docs/blkio.rst
> > > >
> > > > Stefan
> > > >
> > >
> > > I'm sorry about that no enough information of QEMU block nvmf driver and
> > > libnvmf.
> > >
> > > Kernel initiator & userspace initiator
> > > Rather than io_uring/libaio + kernel initiator solution(read 500K+ IOPS &
> > > write 200K+ IOPS), I prefer QEMU block nvmf + libnvmf(RW 200K+ IOPS):
> > > 1, I don't have to upgrade host kernel. I can also run it on a lower
> > > version
> > > of kernel.
> > > 2, During re-connection if target side hits a panic, initiator side would
> > > not get 'D' state(uninterruptable state in kernel), QEMU always could be
> > > killed.
> > > 3, It's easier to trouble shoot for a userspace application.
> >
> > I see, thanks for sharing.
> >
> > > Default NVMe-OF IO queues
> > > The mechanism of QEMU+libnvmf:
> > > 1, QEMU iothread creates a request and dispatches it to NVMe-OF IO queues
> > > thread by lockless list.
> > > 2, QEMU iothread tries to kick NVMe-OF IO queue thread.
> > > 3, NVMe-OF IO queue thread processes request and returns response to the
> > > QEMU iothread.
> > >
> > > When the QEMU iothread reaches the limitation, 4 NVMe-OF IO queues get
> > > better performance.
> >
> > Can you explain this bottleneck? Even with 4 NVMe-oF IO queues there is
> > still just 1 IOThread submitting requests, so why are 4 IO queues faster
> > than 1?
> >
> > Stefan
> >
>
> QEMU + libiscsi solution uses iothread send/recv TCP and processes iSCSI
> PDU directly, it could get about 60K IOPS. Let's look at the perf report of
> the iothread:
> + 35.06% [k] entry_SYSCALL_64_after_hwframe
> + 33.13% [k] do_syscall_64
> + 19.70% [.] 0x0000000100000000
> + 18.31% [.] __libc_send
> + 18.02% [.] iscsi_tcp_service
> + 16.30% [k] __x64_sys_sendto
> + 16.24% [k] __sys_sendto
> + 15.69% [k] sock_sendmsg
> + 15.56% [k] tcp_sendmsg
> + 14.25% [k] __tcp_transmit_skb
> + 13.94% [k] 0x0000000000001000
> + 13.78% [k] tcp_sendmsg_locked
> + 13.67% [k] __ip_queue_xmit
> + 13.00% [k] tcp_write_xmit
> + 12.07% [k] __tcp_push_pending_frames
> + 11.91% [k] inet_recvmsg
> + 11.78% [k] tcp_recvmsg
> + 11.73% [k] ip_output
>
> The bottleneck of this case is TCP, so libnvmf dispatches request to other
> threads by lockless list to reduce the overhead of TCP. It gets more
> effective to process requests from guest.
Are IOThread %usr and %sys CPU utilization close to 100%?
Stefan
signature.asc
Description: PGP signature