[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release
From: |
Jike Song |
Subject: |
Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) |
Date: |
Wed, 27 Jan 2016 09:47:25 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux i686 on x86_64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 |
On 01/27/2016 06:56 AM, Alex Williamson wrote:
> On Tue, 2016-01-26 at 22:39 +0000, Tian, Kevin wrote:
>>> From: Alex Williamson [mailto:address@hidden
>>> Sent: Wednesday, January 27, 2016 6:27 AM
>>>
>>> On Tue, 2016-01-26 at 22:15 +0000, Tian, Kevin wrote:
>>>>> From: Alex Williamson [mailto:address@hidden
>>>>> Sent: Wednesday, January 27, 2016 6:08 AM
>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Today KVMGT (not using VFIO yet) registers I/O emulation callbacks to
>>>>>>>> KVM, so VM MMIO access will be forwarded to KVMGT directly for
>>>>>>>> emulation in kernel. If we reuse above R/W flags, the whole emulation
>>>>>>>> path would be unnecessarily long with obvious performance impact. We
>>>>>>>> either need a new flag here to indicate in-kernel emulation (bias from
>>>>>>>> passthrough support), or just hide the region alternatively (let KVMGT
>>>>>>>> to handle I/O emulation itself like today).
>>>>>>>
>>>>>>> That sounds like a future optimization TBH. There's very strict
>>>>>>> layering between vfio and kvm. Physical device assignment could make
>>>>>>> use of it as well, avoiding a round trip through userspace when an
>>>>>>> ioread/write would do. Userspace also needs to orchestrate those kinds
>>>>>>> of accelerators, there might be cases where userspace wants to see those
>>>>>>> transactions for debugging or manipulating the device. We can't simply
>>>>>>> take shortcuts to provide such direct access. Thanks,
>>>>>>>
>>>>>>
>>>>>> But we have to balance such debugging flexibility and acceptable
>>>>>> performance.
>>>>>> To me the latter one is more important otherwise there'd be no real usage
>>>>>> around this technique, while for debugging there are other alternative
>>>>>> (e.g.
>>>>>> ftrace) Consider some extreme case with 100k traps/second and then see
>>>>>> how much impact a 2-3x longer emulation path can bring...
>>>>>
>>>>> Are you jumping to the conclusion that it cannot be done with proper
>>>>> layering in place? Performance is important, but it's not an excuse to
>>>>> abandon designing interfaces between independent components. Thanks,
>>>>>
>>>>
>>>> Two are not controversial. My point is to remove unnecessary long trip
>>>> as possible. After another thought, yes we can reuse existing read/write
>>>> flags:
>>>> - KVMGT will expose a private control variable whether in-kernel
>>>> delivery is required;
>>>
>>> But in-kernel delivery is never *required*. Wouldn't userspace want to
>>> deliver in-kernel any time it possibly could?
>>>
>>>> - when the variable is true, KVMGT will register in-kernel MMIO
>>>> emulation callbacks then VM MMIO request will be delivered to KVMGT
>>>> directly;
>>>> - when the variable is false, KVMGT will not register anything.
>>>> VM MMIO request will then be delivered to Qemu and then ioread/write
>>>> will be used to finally reach KVMGT emulation logic;
>>>
>>> No, that means the interface is entirely dependent on a backdoor through
>>> KVM. Why can't userspace (QEMU) do something like register an MMIO
>>> region with KVM handled via a provided file descriptor and offset,
>>> couldn't KVM then call the file ops without a kernel exit? Thanks,
>>>
>>
>> Could you elaborate this thought? If it can achieve the purpose w/o
>> a kernel exit definitely we can adapt to it. :-)
>
> I only thought of it when replying to the last email and have been doing
> some research, but we already do quite a bit of synchronization through
> file descriptors. The kvm-vfio pseudo device uses a group file
> descriptor to ensure a user has access to a group, allowing some degree
> of interaction between modules. Eventfds and irqfds already make use of
> f_ops on file descriptors to poke data. So, if KVM had information that
> an MMIO region was backed by a file descriptor for which it already has
> a reference via fdget() (and verified access rights and whatnot), then
> it ought to be a simple matter to get to f_ops->read/write knowing the
> base offset of that MMIO region. Perhaps it could even simply use
> __vfs_read/write(). Then we've got a proper reference to the file
> descriptor for ownership purposes and we've transparently jumped across
> modules without any implicit knowledge of the other end. Could it work?
This is OK for KVMGT, from fops to vgpu device-model would always be simple.
The only question is, how is KVM hypervisor supposed to get the fd on
VM-exitings?
copy-and-paste the current implementation of vcpu_mmio_write(), seems
nothing but GPA and len are provided:
static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
const void *v)
{
int handled = 0;
int n;
do {
n = min(len, 8);
if (!(vcpu->arch.apic &&
!kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev,
addr, n, v))
&& kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, n, v))
break;
handled += n;
addr += n;
len -= n;
v += n;
} while (len);
return handled;
}
If we back a GPA range with a fd, this will also be a 'backdoor'?
> Thanks,
>
> Alex
>
--
Thanks,
Jike
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), (continued)
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Alex Williamson, 2016/01/26
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Tian, Kevin, 2016/01/26
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Alex Williamson, 2016/01/26
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Tian, Kevin, 2016/01/26
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Alex Williamson, 2016/01/26
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Tian, Kevin, 2016/01/26
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Alex Williamson, 2016/01/26
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...),
Jike Song <=
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Alex Williamson, 2016/01/26
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Jike Song, 2016/01/27
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Alex Williamson, 2016/01/27
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Jike Song, 2016/01/28
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Alex Williamson, 2016/01/28
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Jike Song, 2016/01/29
- Re: [Qemu-devel] [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Jike Song, 2016/01/29
- Re: [Qemu-devel] [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Alex Williamson, 2016/01/29
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Yang Zhang, 2016/01/26
- Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), Alex Williamson, 2016/01/26