Re: 9p: requests efficiency

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 9p: requests efficiency

From:	Christian Schoenebeck
Subject:	Re: 9p: requests efficiency
Date:	Thu, 21 Nov 2019 00:37:36 +0100

On Freitag, 15. November 2019 14:26:56 CET Greg Kurz wrote:
> > However when there are a large number of (i.e. small) 9p requests, no
> > matter what the actual request type is, then I am encountering severe
> > performance issues with 9pfs and I try to understand whether this could
> > be improved with reasonable effort.
> 
> Thanks for doing that. This is typically the kind of effort I never
> dared starting on my own.

If you don't mind I still ask some more questions though, just in case you can 
gather them from the back of your head.

> > If I understand it correctly, each incoming request (T message) is
> > dispatched to its own qemu coroutine queue. So individual requests should
> > already be processed in parallel, right?
> 
> Sort of but not exactly. The real parallelization, ie. doing parallel
> processing with concurrent threads, doesn't take place on a per-request
> basis. 

Ok I see, I was just reading that each request causes this call sequence:

        handle_9p_output() -> pdu_submit() -> qemu_co_queue_init(&pdu->complete)

and I was misinterpreting specifically that latter call to be an implied 
thread creation. Because that's what happens with other somewhat similar 
collaborative thread synchronization frameworks like "Grand Central Dispatch" 
or std::async.

But now I realize the entire QEMU coroutine framework is really just managing 
memory stacks, not actually anything about threads per se. The QEMU docs often 
use the term "threads" which is IMO misleading for what it really does.

> A typical request is broken down into several calls to the backend
> which may block because the backend itself calls a syscall that may block
> in the kernel. Each backend call is thus handled by its own thread from the
> mainloop thread pool (see hw/9pfs/coth.[ch] for details). The rest of the
> 9p code, basically everything in 9p.c, is serialized in the mainloop thread.

So the precise parallelism fork points in 9pfs (where tasks are dispatched to 
other threads) are the *_co_*() functions, and there precisely at where they 
are using v9fs_co_run_in_worker( X ) respectively, correct? Or are there more 
fork points than those?

If so, I haven't understood how precisely v9fs_co_run_in_worker() works. I 
mean I understand now how QEMU coroutines are working, and the idea of 
v9fs_co_run_in_worker() is dispatching the passed code block to the worker 
thread, but immediately returning back to main thread and continueing there on 
main thread with other coroutines while the worker thread's dispatched 
coroutine finished. But how that happens there precisely in 
v9fs_co_run_in_worker() is not yet clear to me.

Also where are the worker threads spawned actually?

Best regards,
Christian Schoenebeck

[Prev in Thread]

Current Thread

[Next in Thread]

9p: requests efficiency, Christian Schoenebeck, 2019/11/14
- Re: 9p: requests efficiency, Greg Kurz, 2019/11/15
  - Re: 9p: requests efficiency, Christian Schoenebeck <=
    - Re: 9p: requests efficiency, Christian Schoenebeck, 2019/11/23

Prev by Date: Re: [PATCH v4 25/37] dp8393x: replace PROP_PTR with PROP_LINK
Next by Date: Re: [PATCH 0/6] qapi: Module fixes and cleanups
Previous by thread: Re: 9p: requests efficiency
Next by thread: Re: 9p: requests efficiency
Index(es):
- Date
- Thread