[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC PATCH] hw/arm/virt: use variable size of flash dev
From: |
Laszlo Ersek |
Subject: |
Re: [Qemu-devel] [RFC PATCH] hw/arm/virt: use variable size of flash device to save memory |
Date: |
Tue, 26 Mar 2019 12:03:19 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
On 03/26/19 07:17, Markus Armbruster wrote:
> Zheng Xiang <address@hidden> writes:
>
>> Hi Peter,
>>
>> Thanks for your reply!
>>
>> On 2019/3/25 21:11, Peter Maydell wrote:
>>> On Mon, 25 Mar 2019 at 12:53, Xiang Zheng <address@hidden> wrote:
>>>>
>>>> Currently we fill the VIRT_FLASH space with two 64MB NOR images when
>>>> using persistent UEFI variables on QEMU. Actually we only use a very
>>>> small part of the memory while the rest significant large part of
>>>> memory is wasted.
>>>>
>>>> This patch creates and maps a variable size of flash device instead of
>>>> a mandatory 64MB one to save memory.
>>>>
>>>> Signed-off-by: Xiang Zheng <address@hidden>
>>>> ---
>>>>
>>>> This patch might be insufficient since it also needs to modify the flash
>>>> size
>>>> in ACPI and DTB.
>>>>
>>>> BTW, I don't understand why it requires the two NOR images to be exactly
>>>> 64MB
>>>> in size when using -pflash.
>>>
>>> I don't think we should do this. The board should in general
>>> create the same hardware visible to the guest, not change
>>> it based on subtle things like the size of the image files.
>
> Concur.
>
>>> The reason why the flash images must be 64MB in size
>>> when using -pflash is that they are the backing store
>>> for a writable device. Suppose you have 1MB of data in your
>>> backing image that you pass to QEMU and then the guest writes
>>> to the last block of the flash device. The new data
>>> written by the guest to the end of the device has to be
>>> stored somewhere, so the file has to be large enough
>>> to cover the whole of the flash area.
>>>
>>
>> Is there any way to support config or limit the size that both
>> guest and QEMU are visible?
>>
>> The original QEMU_EFI.fd has only 2M, but we need to stuff it
>> to 64M with 62M unused data. It will consume a large amount of
>> memory when running multiple VM simultaneously.
>
> Here's a number of ideas.
>
> The first one is of course making the flash memory size configurable, to
> eliminate the "unused" part. Our PC machines use the backing image
> sizes as configuration. I consider that a bad idea that should not be
> allowed to spread to other machines. Peter seems to agree.
For the full picture, we have to realize that flash size is a trade-off
for virtual machines too, not just for physical machines. UEFI firmware
tends to grow like an OS -- whether that's good or bad belongs on
another page :) --, and so you tend to need more and more space for it,
even after Link Time Optimization, and image compression, over time.
So you are left with a choice.
* You can let board logic size the flash dynamically at startup (and
then call it a "bad idea" :) ).
* Alternatively, you can hardcode the pflash size in the board setup,
perhaps depdent on machine type version, and/or dependent on some
properties. Even assuming that this diversity doesn't break migration at
once, you'll create a dependency between firmware releases (sizes) and
machine types. 'For the next release of ArmVirtQemu/edk2, you'll need
"virt-4.1" or later'.
In addition, in most cases, the firmware, when it runs from flash,
cannot dynamically adapt itself to random flash sizes and/or base
addresses, so not only will new (larger) firmware not fit on old machine
types, but old (small) firmware may also not feel "at home" on new
machine types. (Note: this is not a theoretical limitation, but it is a
*very* practical one.)
That's a kind of mapping that is taken for "obvious" in the physical
world (you get the board, your firmware comes with it, that's why it's
called *firm*), but it used to be frowned upon in the virtual world.
* Or else, you pad the pflash chips as broadly as you dare, in order to
never run into the above mess -- and then someone complains "it consumes
too many resources". :)
For some perspective, OVMF started out with a 1MB cumulative size
(executable + varstore). We ran out of 1MB with executable code, so we
introduced the 2MB build (with unchanged varstore size). Then, under
Microsoft's SVVP checks, the varstore size proved insufficient, and we
introduced the 4MB buid, with enough space in both the executable part
and the varstore part for them to last for "the foreseeable future".
And, the dynamic logic in the PC board code allows up to a 8MB
cumulative size (and that's also not a machine type property, but a cold
hard constant).
With the dynamic sizing in QEMU (which, IIRC, I had originally
introduced still in the 1MB times, due to the split between the
executable and varstore parts), both the 1MB->2MB switch, and the
2MB->4MB switch in the firmware caused zero pain in QEMU. And right now,
4MB looks like a "sweet spot", with some elbow room left.
The "virt" board took a different approach, with different benefits (no
dynamic sizing at startup, hence identical guest view) and different
problems (it's more hungry for resources). "Pick your poison."
A hopefully more constructive comment below:
> The accepted way to create minor variations of a machine type is machine
> properties. Whether using them to vary flash chip size would be
> acceptable is for the board maintainer to decide.
>
> Now let's think about mapping these flash images more efficiently.
>
> We could avoid backing their "unused" part with pages. Unless the
> "unused" part is read-only, this leaves you at your guests' mercy: they
> can make the host allocate pages by writing to them.
First, the flash backends' on-disk demand shouldn't be catastrophic,
even if we use "raw" -- the executable should be shared between all VMs
running on the host, and the varstore files (which must be private to
VMs) could be created as sparse files. For example, libvirt could be
improved to create sparse copies of the varstore *template* files.
Second, regarding physical RAM consumption: disable memory overcommit,
and set a large swap. The unused parts of the large pflash chips should
be swapped out at some point (after their initial population from disk),
and should never be swapped back in again.
Thanks
Laszlo
>
> We could share the pflash memory among VMs running the same firmware.
> If writes are permitted, we'd have to unshare on write (COW). Again,
> you're at your guests' mercy unless read-only: they can make the host
> unshare pages by writing to them.
>
> I figure the "share" idea would be easier to implement[*].
>
> Both ideas require either trusted guests or read-only flash. EFI
> generally wants some read/write flash for its var store. Can we make
> everything else read-only?
>
> We can improve the device models to let us set up a suitable part of the
> pflash memory read-only. This is how real hardware works. Our PC
> machines currently approximate this with *two* flash chips, one
> read-only, one read/write, which I consider a mistake that should not be
> allowed to spread to other machines.
>
> Prior discussions
>
> Message-ID: <address@hidden>
> https://lists.nongnu.org/archive/html/qemu-devel/2019-02/msg05056.html
>
> and
>
> Message-ID: <address@hidden>
> https://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg06606.html
>
>
>
> [*] If you run KSM (kernel same-page merging), the kernel should set up
> the sharing for you automatically. But not everybody wants to run KSM.
>
Re: [Qemu-devel] [RFC PATCH] hw/arm/virt: use variable size of flash device to save memory, Laszlo Ersek, 2019/03/25