Re: [Qemu-devel] How to reserve guest physical region for ACPI

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] How to reserve guest physical region for ACPI

From:	Igor Mammedov
Subject:	Re: [Qemu-devel] How to reserve guest physical region for ACPI
Date:	Wed, 6 Jan 2016 14:39:16 +0100

On Tue, 5 Jan 2016 18:22:33 +0100
Laszlo Ersek <address@hidden> wrote:

> On 01/05/16 18:08, Igor Mammedov wrote:
> > On Mon, 4 Jan 2016 21:17:31 +0100
> > Laszlo Ersek <address@hidden> wrote:
> >   
> >> Michael CC'd me on the grandparent of the email below. I'll try to add
> >> my thoughts in a single go, with regard to OVMF.
> >>
> >> On 12/30/15 20:52, Michael S. Tsirkin wrote:  
> >>> On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote:    
> >>>> On Mon, 28 Dec 2015 14:50:15 +0200
> >>>> "Michael S. Tsirkin" <address@hidden> wrote:
> >>>>    
> >>>>> On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:    
> >>>>>>
> >>>>>> Hi Michael, Paolo,
> >>>>>>
> >>>>>> Now it is the time to return to the challenge that how to reserve guest
> >>>>>> physical region internally used by ACPI.
> >>>>>>
> >>>>>> Igor suggested that:
> >>>>>> | An alternative place to allocate reserve from could be high memory.
> >>>>>> | For pc we have "reserved-memory-end" which currently makes sure
> >>>>>> | that hotpluggable memory range isn't used by firmware
> >>>>>> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html)
> >>>>>>     
> >>
> >> OVMF has no support for the "reserved-memory-end" fw_cfg file. The
> >> reason is that nobody wrote that patch, nor asked for the patch to be
> >> written. (Not implying that just requesting the patch would be
> >> sufficient for the patch to be written.)
> >>  
> >>>>> I don't want to tie things to reserved-memory-end because this
> >>>>> does not scale: next time we need to reserve memory,
> >>>>> we'll need to find yet another way to figure out what is where.    
> >>>> Could you elaborate a bit more on a problem you're seeing?
> >>>>
> >>>> To me it looks like it scales rather well.
> >>>> For example lets imagine that we adding a device
> >>>> that has some on device memory that should be mapped into GPA
> >>>> code to do so would look like:
> >>>>
> >>>>   pc_machine_device_plug_cb(dev)
> >>>>   {
> >>>>    ...
> >>>>    if (dev == OUR_NEW_DEVICE_TYPE) {
> >>>>        memory_region_add_subregion(as, current_reserved_end, &dev->mr);
> >>>>        set_new_reserved_end(current_reserved_end + 
> >>>> memory_region_size(&dev->mr));
> >>>>    }
> >>>>   }
> >>>>
> >>>> we can practically add any number of new devices that way.    
> >>>
> >>> Yes but we'll have to build a host side allocator for these, and that's
> >>> nasty. We'll also have to maintain these addresses indefinitely (at
> >>> least per machine version) as they are guest visible.
> >>> Not only that, there's no way for guest to know if we move things
> >>> around, so basically we'll never be able to change addresses.
> >>>
> >>>     
> >>>>      
> >>>>> I would like ./hw/acpi/bios-linker-loader.c interface to be extended to
> >>>>> support 64 bit RAM instead    
> >>
> >> This looks quite doable in OVMF, as long as the blob to allocate from
> >> high memory contains *zero* ACPI tables.
> >>
> >> (
> >> Namely, each ACPI table is installed from the containing fw_cfg blob
> >> with EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), and the latter has its
> >> own allocation policy for the *copies* of ACPI tables it installs.
> >>
> >> This allocation policy is left unspecified in the section of the UEFI
> >> spec that governs EFI_ACPI_TABLE_PROTOCOL.
> >>
> >> The current policy in edk2 (= the reference implementation) seems to be
> >> "allocate from under 4GB". It is currently being changed to "try to
> >> allocate from under 4GB, and if that fails, retry from high memory". (It
> >> is motivated by Aarch64 machines that may have no DRAM at all under 4GB.)
> >> )
> >>  
> >>>>> (and maybe a way to allocate and
> >>>>> zero-initialize buffer without loading it through fwcfg),    
> >>
> >> Sounds reasonable.
> >>  
> >>>>> this way bios
> >>>>> does the allocation, and addresses can be patched into acpi.    
> >>>> and then guest side needs to parse/execute some AML that would
> >>>> initialize QEMU side so it would know where to write data.    
> >>>
> >>> Well not really - we can put it in a data table, by itself
> >>> so it's easy to find.    
> >>
> >> Do you mean acpi_tb_find_table(), acpi_get_table_by_index() /
> >> acpi_get_table_with_size()?
> >>  
> >>>
> >>> AML is only needed if access from ACPI is desired.
> >>>
> >>>     
> >>>> bios-linker-loader is a great interface for initializing some
> >>>> guest owned data and linking it together but I think it adds
> >>>> unnecessary complexity and is misused if it's used to handle
> >>>> device owned data/on device memory in this and VMGID cases.    
> >>>
> >>> I want a generic interface for guest to enumerate these things.  linker
> >>> seems quite reasonable but if you see a reason why it won't do, or want
> >>> to propose a better interface, fine.    
> >>
> >> * The guest could do the following:
> >> - while processing the ALLOCATE commands, it would make a note where in
> >> GPA space each fw_cfg blob gets allocated
> >> - at the end the guest would prepare a temporary array with a predefined
> >> record format, that associates each fw_cfg blob's name with the concrete
> >> allocation address
> >> - it would create an FWCfgDmaAccess stucture pointing at this array,
> >> with a new "control" bit set (or something similar)
> >> - the guest could write the address of the FWCfgDmaAccess struct to the
> >> appropriate register, as always.
> >>
> >> * Another idea would be a GET_ALLOCATION_ADDRESS linker/loader command,
> >> specifying:
> >> - the fw_cfg blob's name, for which to retrieve the guest-allocated
> >>   address (this command could only follow the matching ALLOCATE
> >>   command, never precede it)
> >> - a flag whether the address should be written to IO or MMIO space
> >>   (would be likely IO on x86, MMIO on ARM)
> >> - a unique uint64_t key (could be the 16-bit fw_cfg selector value that
> >>   identifies the blob, actually!)
> >> - a uint64_t (IO or MMIO) address to write the unique key and then the
> >>   allocation address to.
> >>
> >> Either way, QEMU could learn about all the relevant guest-side
> >> allocation addresses in a low number of traps. In addition, AML code
> >> wouldn't have to reflect any allocation addresses to QEMU, ever.  
> 
> > That would be nice trick. I see 2 issues here:
> >  1. ACPI tables blob is build atomically when one guest tries to read it
> >     from fw_cfg so patched addresses have to be communicated
> >     to QEMU before that.  
> 
> I don't understand issue #1. I think it is okay if the allocation
> happens strictly after QEMU refreshes / regenerates the ACPI payload.
> Namely, the guest-allocated addresses have two uses:
> - references from within the ACPI payload
If references are from AML, then AML should be patched by linker,
which is tricky and forces us to invent duplicate AML API that
would be able to tell linker where AML object should be patched
(Michael's patch in this thread as example)

It would be better if linker would communicate addresses to QEMU
before AML is built, so that AML would use already present
in QEMU addresses and doesn't have to be patched at all.

> - references from the QEMU side, for device operation.
> 
> The first purpose is covered by the linker/loader itself (that is,
> GET_ALLOCATION_ADDRESS would be used *in addition* to ADD_POINTER). The
> second purpose would be covered by GET_ALLOCATION_ADDRESS.
> 
> >  2. Mo important I think that we are miss-using linker-loader
> >     interface here, trying to from allocate buffer in guest RAM
> >     an so consuming it while all we need a window into device
> >     memory mapped somewhere outside of RAM occupied  address space.  
> 
> But, more importantly, I definitely see your point with issue #2. I'm
> neutral on the question whether this should be solved with the ACPI
> linker/loader or with something else. I'm perfectly fine with "something
> else", as long as it is generic enough. The above GET_ALLOCATION_ADDRESS
> idea is relevant *only if* the ACPI linker/loader is deemed the best
> solution here.
> 
> (Heck, if the linker/loader avenue is rejected here, that's the least
> work for me! :))
> 
> Thanks
> Laszlo
> 
> >   
> >>  
> >>>
> >>> PCI would do, too - though windows guys had concerns about
> >>> returning PCI BARs from ACPI.
> >>>
> >>>     
> >>>> There was RFC on list to make BIOS boot from NVDIMM already
> >>>> doing some ACPI table lookup/parsing. Now if they were forced
> >>>> to also parse and execute AML to initialize QEMU with guest
> >>>> allocated address that would complicate them quite a bit.    
> >>>
> >>> If they just need to find a table by name, it won't be
> >>> too bad, would it?
> >>>     
> >>>> While with NVDIMM control memory region mapped directly by QEMU,
> >>>> respective patches don't need in any way to initialize QEMU,
> >>>> all they would need just read necessary data from control region.
> >>>>
> >>>> Also using bios-linker-loader takes away some usable RAM
> >>>> from guest and in the end that doesn't scale,
> >>>> the more devices I add the less usable RAM is left for guest OS
> >>>> while all the device needs is a piece of GPA address space
> >>>> that would belong to it.    
> >>>
> >>> I don't get this comment. I don't think it's MMIO that is wanted.
> >>> If it's backed by qemu virtual memory then it's RAM.
> >>>     
> >>>>>
> >>>>> See patch at the bottom that might be handy.    
> >>
> >> I've given up on Microsoft implementing DataTableRegion. (It's sad, 
> >> really.)
> >>
> >> From last year I have a WIP version of "docs/vmgenid.txt" that is based
> >> on Michael's build_append_named_dword() function. If
> >> GET_ALLOCATION_ADDRESS above looks good, then I could simplify the ACPI
> >> stuff in that text file (and hopefully post it soon after for comments?)
> >>  
> >>>>>    
> >>>>>> he also innovated a way to use 64-bit address in DSDT/SSDT.rev = 1:
> >>>>>> | when writing ASL one shall make sure that only XP supported
> >>>>>> | features are in global scope, which is evaluated when tables
> >>>>>> | are loaded and features of rev2 and higher are inside methods.
> >>>>>> | That way XP doesn't crash as far as it doesn't evaluate unsupported
> >>>>>> | opcode and one can guard those opcodes checking _REV object if 
> >>>>>> neccesary.
> >>>>>> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg01010.html)
> >>>>>>     
> >>>>>
> >>>>> Yes, this technique works.    
> >>
> >> Agreed.
> >>  
> >>>>>
> >>>>> An alternative is to add an XSDT, XP ignores that.
> >>>>> XSDT at the moment breaks OVMF (because it loads both
> >>>>> the RSDT and the XSDT, which is wrong), but I think
> >>>>> Laszlo was working on a fix for that.    
> >>
> >> We have to distinguish two use cases here.
> >>
> >> * The first is the case when QEMU prepares both an XSDT and an RSDT, and
> >> links at least one common ACPI table from both. This would cause OVMF to
> >> pass the same source (= to-be-copied) table to
> >> EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable() twice, with one of the
> >> following outcomes:
> >>
> >> - there would be two instances of the same table (think e.g. SSDT)
> >> - the second attempt would be rejected (e.g. FADT) and that error would
> >>   terminate the linker-loader procedure.
> >>
> >> This issue would not be too hard to overcome, with a simple "memoization
> >> technique". After the initial loading & linking of the tables, OVMF
> >> could remember the addresses of the "source" ACPI tables, and could
> >> avoid passing already installed source tables to
> >> EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable() for a second time.
> >>
> >> * The second use case is when an ACPI table is linked *only* from QEMU's
> >> XSDT. This is much harder to fix, because
> >> EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable() in edk2 links the copy of the
> >> passed-in table into *both* RSDT and XSDT, automatically. And, again,
> >> the UEFI spec doesn't provide a way to control this from the caller
> >> (i.e. from within OVMF).
> >>
> >> I have tried earlier to effect a change in the specification of
> >> EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), on the ASWG and USWG mailing
> >> lists. (At that time I was trying to expose UEFI memory *type* to the
> >> caller, from which the copy of the ACPI table being installed should be
> >> allocated from.) Alas, I received no answers at all.
> >>
> >> All in all I strongly recommend the "place rev2+ objects in method
> >> scope" trick, over the "link it from the XSDT only" trick.
> >>  
> >>>> Using XSDT would increase ACPI tables occupied RAM
> >>>> as it would duplicate DSDT + non XP supported AML
> >>>> at global namespace.    
> >>>
> >>> Not at all - I posted patches linking to same
> >>> tables from both RSDT and XSDT at some point.    
> >>
> >> Yes, at <http://thread.gmane.org/gmane.comp.emulators.qemu/342559>. This
> >> could be made work in OVMF with the above mentioned memoization stuff.
> >>  
> >>> Only the list of pointers would be different.    
> >>
> >> I don't recommend that, see the second case above.
> >>
> >> Thanks
> >> Laszlo
> >>  
> >>>> So far we've managed keep DSDT compatible with XP while
> >>>> introducing features from v2 and higher ACPI revisions as
> >>>> AML that is only evaluated on demand.
> >>>> We can continue doing so unless we have to unconditionally
> >>>> add incompatible AML at global scope.
> >>>>    
> >>>
> >>> Yes.
> >>>     
> >>>>>    
> >>>>>> Michael, Paolo, what do you think about these ideas?
> >>>>>>
> >>>>>> Thanks!    
> >>>>>
> >>>>>
> >>>>>
> >>>>> So using a patch below, we can add Name(PQRS, 0x0) at the top of the
> >>>>> SSDT (or bottom, or add a separate SSDT just for that).  It returns the
> >>>>> current offset so we can add that to the linker.
> >>>>>
> >>>>> Won't work if you append the Name to the Aml structure (these can be
> >>>>> nested to arbitrary depth using aml_append), so using plain GArray for
> >>>>> this API makes sense to me.
> >>>>>    
> >>>>> --->    
> >>>>>
> >>>>> acpi: add build_append_named_dword, returning an offset in buffer
> >>>>>
> >>>>> This is a very limited form of support for runtime patching -
> >>>>> similar in functionality to what we can do with ACPI_EXTRACT
> >>>>> macros in python, but implemented in C.
> >>>>>
> >>>>> This is to allow ACPI code direct access to data tables -
> >>>>> which is exactly what DataTableRegion is there for, except
> >>>>> no known windows release so far implements DataTableRegion.    
> >>>> unsupported means Windows will BSOD, so it's practically
> >>>> unusable unless MS will patch currently existing Windows
> >>>> versions.    
> >>>
> >>> Yes. That's why my patch allows patching SSDT without using
> >>> DataTableRegion.
> >>>     
> >>>> Another thing about DataTableRegion is that ACPI tables are
> >>>> supposed to have static content which matches checksum in
> >>>> table the header while you are trying to use it for dynamic
> >>>> data. It would be cleaner/more compatible to teach
> >>>> bios-linker-loader to just allocate memory and patch AML
> >>>> with the allocated address.    
> >>>
> >>> Yes - if address is static, you need to put it outside
> >>> the table. Can come right before or right after this.
> >>>     
> >>>> Also if OperationRegion() is used, then one has to patch
> >>>> DefOpRegion directly as RegionOffset must be Integer,
> >>>> using variable names is not permitted there.    
> >>>
> >>> I am not sure the comment was understood correctly.
> >>> The comment says really "we can't use DataTableRegion
> >>> so here is an alternative".
> >>>     
> >>>>    
> >>>>>
> >>>>> Signed-off-by: Michael S. Tsirkin <address@hidden>
> >>>>>
> >>>>> ---
> >>>>>
> >>>>> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> >>>>> index 1b632dc..f8998ea 100644
> >>>>> --- a/include/hw/acpi/aml-build.h
> >>>>> +++ b/include/hw/acpi/aml-build.h
> >>>>> @@ -286,4 +286,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables 
> >>>>> *tables, bool mfre);
> >>>>>  void
> >>>>>  build_rsdt(GArray *table_data, GArray *linker, GArray *table_offsets);
> >>>>>  
> >>>>> +int
> >>>>> +build_append_named_dword(GArray *array, const char *name_format, ...);
> >>>>> +
> >>>>>  #endif
> >>>>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> >>>>> index 0d4b324..7f9fa65 100644
> >>>>> --- a/hw/acpi/aml-build.c
> >>>>> +++ b/hw/acpi/aml-build.c
> >>>>> @@ -262,6 +262,32 @@ static void build_append_int(GArray *table, 
> >>>>> uint64_t value)
> >>>>>      }
> >>>>>  }
> >>>>>  
> >>>>> +/* Build NAME(XXXX, 0x0) where 0x0 is encoded as a qword,
> >>>>> + * and return the offset to 0x0 for runtime patching.
> >>>>> + *
> >>>>> + * Warning: runtime patching is best avoided. Only use this as
> >>>>> + * a replacement for DataTableRegion (for guests that don't
> >>>>> + * support it).
> >>>>> + */
> >>>>> +int
> >>>>> +build_append_named_qword(GArray *array, const char *name_format, ...)
> >>>>> +{
> >>>>> +    int offset;
> >>>>> +    va_list ap;
> >>>>> +
> >>>>> +    va_start(ap, name_format);
> >>>>> +    build_append_namestringv(array, name_format, ap);
> >>>>> +    va_end(ap);
> >>>>> +
> >>>>> +    build_append_byte(array, 0x0E); /* QWordPrefix */
> >>>>> +
> >>>>> +    offset = array->len;
> >>>>> +    build_append_int_noprefix(array, 0x0, 8);
> >>>>> +    assert(array->len == offset + 8);
> >>>>> +
> >>>>> +    return offset;
> >>>>> +}
> >>>>> +
> >>>>>  static GPtrArray *alloc_list;
> >>>>>  
> >>>>>  static Aml *aml_alloc(void)
> >>>>>
> >>>>>    
> >>  
> >   
>

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] How to reserve guest physical region for ACPI, Laszlo Ersek, 2016/01/04
- Re: [Qemu-devel] How to reserve guest physical region for ACPI, Igor Mammedov, 2016/01/05
  - Re: [Qemu-devel] How to reserve guest physical region for ACPI, Laszlo Ersek, 2016/01/05
    - Re: [Qemu-devel] How to reserve guest physical region for ACPI, Igor Mammedov <=
    - Re: [Qemu-devel] How to reserve guest physical region for ACPI, Laszlo Ersek, 2016/01/06
- Re: [Qemu-devel] How to reserve guest physical region for ACPI, Igor Mammedov, 2016/01/07
  - Re: [Qemu-devel] How to reserve guest physical region for ACPI, Laszlo Ersek, 2016/01/07
- Re: [Qemu-devel] How to reserve guest physical region for ACPI, Igor Mammedov, 2016/01/05
  - Re: [Qemu-devel] How to reserve guest physical region for ACPI, Michael S. Tsirkin, 2016/01/05
    - Re: [Qemu-devel] How to reserve guest physical region for ACPI, Laszlo Ersek, 2016/01/05
    - Re: [Qemu-devel] How to reserve guest physical region for ACPI, Xiao Guangrong, 2016/01/05
    - Re: [Qemu-devel] How to reserve guest physical region for ACPI, Igor Mammedov, 2016/01/07
    - Re: [Qemu-devel] How to reserve guest physical region for ACPI, Xiao Guangrong, 2016/01/07
    - Re: [Qemu-devel] How to reserve guest physical region for ACPI, Laszlo Ersek, 2016/01/08

Prev by Date: Re: [Qemu-devel] [PATCH v8 1/4] hw/ptimer: Fix issues caused by the adjusted timer limit value
Next by Date: [Qemu-devel] [PATCH] SCSI bus: fix to incomplete QOMify
Previous by thread: Re: [Qemu-devel] How to reserve guest physical region for ACPI
Next by thread: Re: [Qemu-devel] How to reserve guest physical region for ACPI
Index(es):
- Date
- Thread