qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH -V3 09/32] virtio-9p: Implement P9_TWRITE/ Threa


From: Avi Kivity
Subject: Re: [Qemu-devel] [PATCH -V3 09/32] virtio-9p: Implement P9_TWRITE/ Thread model in QEMU
Date: Tue, 30 Mar 2010 13:24:03 +0300
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100301 Fedora/3.0.3-1.fc12 Thunderbird/3.0.3

On 03/30/2010 12:23 AM, Anthony Liguori wrote:
It's not sufficient. If you have a single thread that runs both live migrations and timers, then timers will be backlogged behind live migration, or you'll have to yield often. This is regardless of the locking model (and of course having threads without fixing the locking is insufficient as well, live migration accesses guest memory so it needs the big qemu lock).


But what's the solution? Sending every timer in a separate thread? We'll hit the same problem if we implement an arbitrary limit to number of threads.

A completion that's expected to take a couple of microseconds at most can live in the iothread. A completion that's expected to take a couple of milliseconds wants its own thread. We'll have to think about anything in between.

vnc and migration can perform large amounts of work in a single completion; they're limited only by the socket send rate and our internal rate-limiting which are both outside our control. Most device timers are O(1). virtio completions probably fall into the annoying "have to think about it" department.

What I'm skeptical of, is whether converting virtio-9p or qcow2 to handle each request in a separate thread is really going to improve things.

Currently qcow2 isn't even fullly asynchronous, so it can't fail to improve things.

Unless it introduces more data corruptions which is my concern with any significant change to qcow2.

It's possible to move qcow2 to a thread without any significant change to it (simply run the current code in its own thread, protected by a mutex). Further changes would be very incremental.

The VNC server is another area that I think multithreading would be a bad idea.

If the vnc server is stuffing a few megabytes of screen into a socket, then timers will be delayed behind it, unless you litter the code with calls to bottom halves. Even worse if it does complicated compression and encryption.

Sticking the VNC server in it's own thread would be fine. Trying to make the VNC server multithreaded though would be problematic.

Why would it be problematic? Each client gets its own threads, they don't interact at all do they?

I don't see a need to do it though (beyond dropping it into a thread).

Basically, sticking isolated components in a single thread should be pretty reasonable.

Now you're doomed. It's easy to declare things "isolated components" one by one, pretty soon the main loop will be gone.


But if those system calls are blocking, you need a thread?

You can dispatch just the system call to a thread pool. The advantage of doing that is that you don't need to worry about locking since the system calls are not (usually) handling shared state.

There is always implied shared state. If you're doing direct guest memory access, you need to lock memory against hotunplug, or the syscall will end up writing into freed memory. If the device can be hotunplugged, you need to make sure all threads have returned before unplugging it.

There are other ways to handle hot unplug (like reference counting) that avoid this problem.

That's just more clever locking.

Ultimately, this comes down to a question of lock granularity and thread granularity. I don't think it's a good idea to start with the assumption that we want extremely fine granularity. There's certainly very low hanging fruit with respect to threading.

Sure. Currently the hotspots are block devices (except raw) and hpet (seen with large Windows guests). The latter includes the bus lookup and hpet itself, hpet reads can be performed locklessly if we're clever.

On a philosophical note, threads may be easier to model complex hardware that includes a processor, for example our scsi card (and how about using tcg as a jit to boost it :)

Yeah, it's hard to argue that script evaluation shouldn't be done in a thread. But that doesn't prevent me from being very cautious about how and where we use threading :-)

Caution where threads are involved is a good thing. They are inevitable however, IMO.

We already are using threads so they aren't just inevitable, they're reality. I still don't think using threads would significantly simplify virtio-9p.


I meant, exposing qemu core to the threads instead of pretending they aren't there. I'm not familiar with 9p so don't hold much of an opinion, but didn't you say you need threads in order to handle async syscalls? That may not be the deep threading we're discussing here.

btw, IIUC currently disk hotunplug will stall a guest, no? We need async aio_flush().

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]