qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address


From: Eduardo Habkost
Subject: Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.
Date: Thu, 27 Dec 2018 12:54:02 -0200
User-agent: Mutt/1.10.1 (2018-07-13)

On Fri, Dec 21, 2018 at 03:13:25PM +0100, Igor Mammedov wrote:
> On Thu, 20 Dec 2018 19:18:01 -0200
> Eduardo Habkost <address@hidden> wrote:
> 
> > On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote:
> > > On Wed, 19 Dec 2018 10:57:17 +0800
> > > Yu Zhang <address@hidden> wrote:
> > >   
> > > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:  
> > > > > On Tue, 18 Dec 2018 17:27:23 +0800
> > > > > Yu Zhang <address@hidden> wrote:
> > > > >     
> > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:    
> > > > > > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > > > > > Yu Zhang <address@hidden> wrote:
> > > > > > >     
> > > > > > > > Currently, vIOMMU is using the value of IOVA address width, 
> > > > > > > > instead of
> > > > > > > > the host address width(HAW) to calculate the number of reserved 
> > > > > > > > bits in
> > > > > > > > data structures such as root entries, context entries, and 
> > > > > > > > entries of
> > > > > > > > DMA paging structures etc.
> > > > > > > > 
> > > > > > > > However values of IOVA address width and of the HAW may not 
> > > > > > > > equal. For
> > > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no 
> > > > > > > > wider than
> > > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit 
> > > > > > > > may result
> > > > > > > > in an invalid IOVA being accepted.
> > > > > > > > 
> > > > > > > > To fix this, a new field - haw_bits is introduced in struct 
> > > > > > > > IntelIOMMUState,
> > > > > > > > whose value is initialized based on the maximum physical 
> > > > > > > > address set to
> > > > > > > > guest CPU.    
> > > > > > >     
> > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > > > > > > to clarify.
> > > > > > > > 
> > > > > > > > Signed-off-by: Yu Zhang <address@hidden>
> > > > > > > > Reviewed-by: Peter Xu <address@hidden>
> > > > > > > > ---    
> > > > > > > [...]
> > > > > > >     
> > > > > > > > @@ -3100,6 +3104,8 @@ static void 
> > > > > > > > vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> > > > > > > >  static void vtd_init(IntelIOMMUState *s)
> > > > > > > >  {
> > > > > > > >      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > > > > > +    CPUState *cs = first_cpu;
> > > > > > > > +    X86CPU *cpu = X86_CPU(cs);
> > > > > > > >  
> > > > > > > >      memset(s->csr, 0, DMAR_REG_SIZE);
> > > > > > > >      memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > > > > > > >      s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > > > > > >               VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > > > > > >               VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > > > > > -    if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > > > > > +    if (s->aw_bits == VTD_AW_48BIT) {
> > > > > > > >          s->cap |= VTD_CAP_SAGAW_48bit;
> > > > > > > >      }
> > > > > > > >      s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > > > > > +    s->haw_bits = cpu->phys_bits;    
> > > > > > > Is it possible to avoid accessing CPU fields directly or cpu 
> > > > > > > altogether
> > > > > > > and set phys_bits when iommu is created?    
> > > > > > 
> > > > > > Thanks for your comments, Igor.
> > > > > > 
> > > > > > Well, I guess you prefer not to query the CPU capabilities while 
> > > > > > deciding
> > > > > > the vIOMMU features. But to me, they are not that irrelevant.:)
> > > > > > 
> > > > > > Here the hardware address width in vt-d, and the one in 
> > > > > > cpuid.MAXPHYSADDR
> > > > > > are referring to the same concept. In VM, both are the maximum 
> > > > > > guest physical
> > > > > > address width. If we do not check the CPU field here, we will still 
> > > > > > have to
> > > > > > check the CPU field in other places such as build_dmar_q35(), and 
> > > > > > reset the
> > > > > > s->haw_bits again.
> > > > > > 
> > > > > > Is this explanation convincing enough? :)    
> > > > > current build_dmar_q35() doesn't do it, it's all new code in this 
> > > > > series that
> > > > > contains not acceptable direct access from one device (iommu) to 
> > > > > another (cpu).   
> > > > > Proper way would be for the owner of iommu to fish limits from 
> > > > > somewhere and set
> > > > > values during iommu creation.    
> > > > 
> > > > Well, current build_dmar_q35() doesn't do it, because it is using the 
> > > > incorrect value. :)
> > > > According to the spec, the host address width is the maximum physical 
> > > > address width,
> > > > yet current implementation is using the DMA address width. For me, this 
> > > > is not only
> > > > wrong, but also unsecure. For this point, I think we all agree this 
> > > > need to be fixed.
> > > > 
> > > > As to how to fix it - should we query the cpu fields, I still do not 
> > > > understand why
> > > > this is not acceptable. :)
> > > > 
> > > > I had thought of other approaches before, yet I did not choose:
> > > >   
> > > > 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu 
> > > > to limit its    
> > > > physical address width(similar to the "x-aw-bits" for IOVA). But what 
> > > > should we check
> > > > this parameter or not? What if this parameter is set to sth. different 
> > > > than the "phys-bits"
> > > > or not?
> > > >   
> > > > 2> Another choice I had thought of is, to query the physical iommu. I 
> > > > abandoned this    
> > > > idea because my understanding is that vIOMMU is not a passthrued 
> > > > device, it is emulated.  
> > >   
> > > > So Igor, may I ask why you think checking against the cpu fields so not 
> > > > acceptable? :)  
> > > Because accessing private fields of device from another random device is 
> > > not robust
> > > and a subject to breaking in unpredictable manner when field meaning or 
> > > initialization
> > > order changes. (analogy to baremetal: one does not solder wire to a CPU 
> > > die to let
> > > access some piece of data from random device).
> > >   
> > 
> > With either the solution below or the one I proposed, we still
> > have a ordering problem: if we want "-cpu ...,phys-bits=..." to
> As Michael said, it's questionable if iommu should rely on guest's
> phys-bits at all,

Agreed, this is not clear.  I don't know yet if we really want to
make "-cpu" affect other devices.  Probably not.

>                   but that aside we should use proper interfaces
> and hierarchy to initialize devices, see below why I dislike
> simplistic pc_max_phys_bits().

What do you mean by proper interfaces and hierarchy?

pc_max_phys_bits() is simple, and that's supposed to be a good
thing.

> 
> > affect the IOMMU device, we will need the CPU objects to be
> > created before IOMMU realize.
> > 
> > At least both proposals make the initialization ordering
> > explicitly a responsibility of the machine code.  In either case,
> > I don't think we will start creating all CPU objects after device
> > realize any time soon.
> > 
> > 
> > > I've looked at intel-iommu code and how it's created so here is a way to 
> > > do the thing
> > > you need using proper interfaces:
> > > 
> > > 1. add x-haw_bits property
> > > 2. include in your series patch
> > >     '[Qemu-devel] [PATCH] qdev: let machine hotplug handler to override  
> > > bus hotplug handler'
> > > 3. add your iommu to pc_get_hotpug_handler() to redirect plug flow to
> > >    machine and let _pre_plug handler to check and set x-haw_bits for 
> > > machine level  
> > 
> > Wow, that's a very complex way to pass a single integer from
> > machine code to device code.  If this is the only way to do that,
> > we really need to take a step back and rethink our API design.
> > 
> > What's wrong with having a simple
> >   uint32_t pc_max_phys_bits(PCMachineState*)
> > function?
> As suggested, it would be only aesthetic change for accessing first_cpu from
> random device at random time. IOMMU would still access cpu instance directly
> no matter how much wrappers one would use so it's still the same hack.
> If phys_bits were changing during VM lifecycle and iommu needed to use
> updated value then using pc_max_phys_bits() might be justified as
> we don't have interfaces to handle that but that's not the case here.

I don't understand what you mean here.  Which "interfaces to
handle that" you are talking about?

> 
> I suggested a typical way (albeit a bit complex) to handle device
> initialization in cases where bus plug handler is not sufficient.
> It follows proper hierarchy without any layer violations and can fail
> gracefully even if we start creating CPUs later using only '-device cpufoo'
> without need to fix iommu code to handle that (it would fail creating
> iommu with clear error that CPU isn't available and all user have to
> do is to fix CLI to make sure that CPU is created before iommu).

What do you mean by "proper hierarchy" and "layer violations"?
What exactly is wrong with having device code talking to the
machine object?

You do have a point about "-device cpufoo": making "-cpu" affect
iommu phys-bits is probably not a good idea after all.

> 
> So I'd prefer if we used exiting pattern for device initialization
> instead of hacks whenever it is possible.

Why do you describe it as a hack?  It's just C code calling a C
function.  I don't see any problem in having device code talking
to the machine code to get a bit of information.


> 
> > 
> > > 4. you probably can use phys-bits/host-phys-bits properties to get data 
> > > that you need
> > >    also see how ms->possible_cpus, that's how you can get access to CPU 
> > > from machine
> > >    layer.
> > >   
> > [...]
> > 
> PS:
> Another thing I'd like to draw your attention to (since you recently looked at
> phys-bits) is about host/guest phys_bits and if it's safe from migration pov
> between hosts with different limits.
> 
> 

-- 
Eduardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]