qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size


From: Christian Schoenebeck
Subject: Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size
Date: Thu, 14 May 2020 19:21:51 +0200

On Donnerstag, 14. Mai 2020 18:24:58 CEST Stefano Stabellini wrote:
> > > The Xen transport
> > > (https://xenbits.xenproject.org/docs/4.13-testing/misc/9pfs.html) has a
> > > shared memory area with 64 pages, so 256KB. It is split in half: 128KB
> > > for requests and 128KB for responses.
> > 
> > That's very little. That also means you won't be able to ever achieve
> > decent 9pfs performance with Xen, because that requires a much larger
> > msize of approximately msize >= 4MB, depending on the underlying I/O
> > hardware, tendency growing (due to rising I/O hardware bandwidth).
> 
> That's great feedback. Fortunately the size of the transport is
> configurable so it is one liner away from becoming much bigger (well,
> two oneliners, see net/9p/trans_xen.c:XEN_9PFS_RING_ORDER in Linux, and
> hw/9pfs/xen-9p-backend.c:MAX_RING_ORDER in QEMU.)
> 
> Would you recommend 4MB each way, so a total of 8MB?

It would be an improvement for Xen, sure. But what if a user picks e.g. 
msize=100MB? Then you are back at square one, because if guest does this on a 
huge file:

        const size_t sz = 5*1024*1024;
        char* buf = malloc(sz);
        // read file in 5MiB chunks
        while (read(fd, buf, sz) > 0) {
                ...
        }

then you are still unable to handle this; the yield hack would loop forever.

> > As far as I understand you now, the problem with Xen transport seems to be
> > that clients can submit requests more quickly than the 9pfs server could
> > process. That would change the overall situation completely, because my
> > hack solution with delaying delivery (yield) suggested previously, was
> > based on the assumption that this low transport buffer scenario only
> > happens once on boot, but not possibly at any later point again and
> > again.
> 
> Yes I think it could happen at any time, not just at boot.
> 
> From my understanding, I don't think it is necessarily due to QEMU being
> slow. In fact, if QEMU was slow and the client fast, the reply ring
> would be empty because the client would be up to speed, therefore msize
> == transport_available_size, and there is no problem.

Right, if msize == transport_max_size then Xen easily runs into this 
situation, no matter how fast client is to pull responses. Because in this 
situation it requires just a single response still pending to be pulled by 
client to trigger this Xen transport buffer underrun situation.

> So maybe it is OK to delaying delivery because it seems to me that is
> the issue.
> 
> Also I am thinking that if we are going to increase the ring size to
> 4MB, maybe we should take the opportunity to set msize=transport_size/2
> just to reduce the likelihood of this kind of issue, in addition to a
> proper fix.

To prevent the yield hack to loop forever, yes, you would at least need to 
limit 'msize' according to Xen's maximum transport buffer size. This must be 
done when client opens a 9P session when client sends Tversion:

http://ericvh.github.io/9p-rfc/rfc9p2000.html#version

However I would make the Xen transport buffer size at least configurable for 
administrators. You'll never know how large people want to pick 'msize' for 
performance reasons.

> Yes, I think we want to do three things:
> - increase the transport size for performance
> - set msize=transport_size/2 to decrease the likehood of this kind of
>   issues

Sounds like a plan to me!

> - introduce a slow but safe fix when the issue happens, and
>   qemu_coroutine_yield() or yield_until_fd_readable() could be OK if it
>   is very rare

Yeah, and better test that this would not hang. ;-)

I would also call warn_report_once() in case this slow yield hack is 
triggered, to let admin know that Xen's transport buffer should be increased 
to avoid this performance issue.

Likewise call warn_report_once() if server's Tversion request handler had to 
reduce msize.

Best regards,
Christian Schoenebeck





reply via email to

[Prev in Thread] Current Thread [Next in Thread]