[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [Qemu-devel] [RFC PATCH] hw/arm/virt: use variable size o

From: Markus Armbruster
Subject: Re: [Qemu-arm] [Qemu-devel] [RFC PATCH] hw/arm/virt: use variable size of flash device to save memory
Date: Tue, 26 Mar 2019 17:39:22 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Laszlo Ersek <address@hidden> writes:

> On 03/26/19 07:17, Markus Armbruster wrote:
>> Zheng Xiang <address@hidden> writes:
>>> Hi Peter,
>>> Thanks for your reply!
>>> On 2019/3/25 21:11, Peter Maydell wrote:
>>>> On Mon, 25 Mar 2019 at 12:53, Xiang Zheng <address@hidden> wrote:
>>>>> Currently we fill the VIRT_FLASH space with two 64MB NOR images when
>>>>> using persistent UEFI variables on QEMU. Actually we only use a very
>>>>> small part of the memory while the rest significant large part of
>>>>> memory is wasted.
>>>>> This patch creates and maps a variable size of flash device instead of
>>>>> a mandatory 64MB one to save memory.
>>>>> Signed-off-by: Xiang Zheng <address@hidden>
>>>>> ---
>>>>> This patch might be insufficient since it also needs to modify the flash 
>>>>> size
>>>>> in ACPI and DTB.
>>>>> BTW, I don't understand why it requires the two NOR images to be exactly 
>>>>> 64MB
>>>>> in size when using -pflash.
>>>> I don't think we should do this. The board should in general
>>>> create the same hardware visible to the guest, not change
>>>> it based on subtle things like the size of the image files.
>> Concur.
>>>> The reason why the flash images must be 64MB in size
>>>> when using -pflash is that they are the backing store
>>>> for a writable device. Suppose you have 1MB of data in your
>>>> backing image that you pass to QEMU and then the guest writes
>>>> to the last block of the flash device. The new data
>>>> written by the guest to the end of the device has to be
>>>> stored somewhere, so the file has to be large enough
>>>> to cover the whole of the flash area.
>>> Is there any way to support config or limit the size that both
>>> guest and QEMU are visible?
>>> The original QEMU_EFI.fd has only 2M, but we need to stuff it
>>> to 64M with 62M unused data. It will consume a large amount of
>>> memory when running multiple VM simultaneously.
>> Here's a number of ideas.
>> The first one is of course making the flash memory size configurable, to
>> eliminate the "unused" part.  Our PC machines use the backing image
>> sizes as configuration.  I consider that a bad idea that should not be
>> allowed to spread to other machines.  Peter seems to agree.
> For the full picture, we have to realize that flash size is a trade-off
> for virtual machines too, not just for physical machines. UEFI firmware
> tends to grow like an OS -- whether that's good or bad belongs on
> another page :) --, and so you tend to need more and more space for it,
> even after Link Time Optimization, and image compression, over time.

Same for physical and virtual machines.

> So you are left with a choice.
> * You can let board logic size the flash dynamically at startup (and
> then call it a "bad idea" :) ).
> * Alternatively, you can hardcode the pflash size in the board setup,
> perhaps depdent on machine type version, and/or dependent on some
> properties.

The difference between getting flash memory size from the backend
vs. getting it from a machine type or property is implicit vs. explicit

Implicit vs. explicit can have ramifications beyond the user interface.
For instance, if we ever get around to transferring device configuration
in the migration stream, the machine property would surely be part of
that, but the size of the backend won't.

>             Even assuming that this diversity doesn't break migration at
> once,

I doubt explicit could break anything that implicit couldn't :)

>       you'll create a dependency between firmware releases (sizes) and
> machine types. 'For the next release of ArmVirtQemu/edk2, you'll need
> "virt-4.1" or later'.

Yes if you tie the size to the machine type.  No if you get it from a
machine property.

> In addition, in most cases, the firmware, when it runs from flash,
> cannot dynamically adapt itself to random flash sizes and/or base
> addresses, so not only will new (larger) firmware not fit on old machine
> types, but old (small) firmware may also not feel "at home" on new
> machine types. (Note: this is not a theoretical limitation, but it is a
> *very* practical one.)

The exact same problem exists for physical machines.  You can revise
your firmware only within limits set by the board.

I don't mean to say this problem isn't worth avoiding for virtual
machines.  Only that it is neither new nor intractable.

> That's a kind of mapping that is taken for "obvious" in the physical
> world (you get the board, your firmware comes with it, that's why it's
> called *firm*), but it used to be frowned upon in the virtual world.

I'm willing to give developers of virtual firmware more flexibility than
they get in the physical world.  I just happen to dislike "implicit" and
"any multiple of 4KiB up to a limit" (because physical flash chips with
sizes like 64140KiB do not exist, and virtual ones should not).

> * Or else, you pad the pflash chips as broadly as you dare, in order to
> never run into the above mess -- and then someone complains "it consumes
> too many resources". :)

I think that complaint would be exactly as valid for unpadded firmware!
Once you get to the point where you care whether each guest's firmware
eats up 2MiB or 64MiB, what you *really* want is probably 128KiB per
guest plust $whatever shared among all guests.  You probably won't care
all that much whether $whatever is 2MiB - 128KiB or 64MiB - 128KiB.

> For some perspective, OVMF started out with a 1MB cumulative size
> (executable + varstore). We ran out of 1MB with executable code, so we
> introduced the 2MB build (with unchanged varstore size). Then, under
> Microsoft's SVVP checks, the varstore size proved insufficient, and we
> introduced the 4MB buid, with enough space in both the executable part
> and the varstore part for them to last for "the foreseeable future".
> And, the dynamic logic in the PC board code allows up to a 8MB
> cumulative size (and that's also not a machine type property, but a cold
> hard constant).
> With the dynamic sizing in QEMU (which, IIRC, I had originally
> introduced still in the 1MB times, due to the split between the
> executable and varstore parts), both the 1MB->2MB switch, and the
> 2MB->4MB switch in the firmware caused zero pain in QEMU. And right now,
> 4MB looks like a "sweet spot", with some elbow room left.

Explicit configuration would've been exactly as painless.  Even with
pflash sizes restricted to powers of two.

> The "virt" board took a different approach, with different benefits (no
> dynamic sizing at startup, hence identical guest view) and different
> problems (it's more hungry for resources). "Pick your poison."

Here's the one I'd like to be able to pick: have a single pflash chip
with a read-only part and a read/write part.  Map the read-only part so
it is shared among guests.  Map the read/write part normally.  Default
the sizes to something that makes sense now, with reasonable elbow room.
Make sure there's a way to grow.

> A hopefully more constructive comment below:
>> The accepted way to create minor variations of a machine type is machine
>> properties.  Whether using them to vary flash chip size would be
>> acceptable is for the board maintainer to decide.
>> Now let's think about mapping these flash images more efficiently.
>> We could avoid backing their "unused" part with pages.  Unless the
>> "unused" part is read-only, this leaves you at your guests' mercy: they
>> can make the host allocate pages by writing to them.
> First, the flash backends' on-disk demand shouldn't be catastrophic,
> even if we use "raw" -- the executable should be shared between all VMs
> running on the host, and the varstore files (which must be private to
> VMs) could be created as sparse files. For example, libvirt could be
> improved to create sparse copies of the varstore *template* files.

Same story as in memory, really: share the read-only part (i.e. the
executable) among the guests, keep the read/write part small.

> Second, regarding physical RAM consumption: disable memory overcommit,
> and set a large swap. The unused parts of the large pflash chips should
> be swapped out at some point (after their initial population from disk),
> and should never be swapped back in again.

Just providing swap should do the trick, shouldn't it?

> Thanks
> Laszlo
>> We could share the pflash memory among VMs running the same firmware.
>> If writes are permitted, we'd have to unshare on write (COW).  Again,
>> you're at your guests' mercy unless read-only: they can make the host
>> unshare pages by writing to them.
>> I figure the "share" idea would be easier to implement[*].
>> Both ideas require either trusted guests or read-only flash.  EFI
>> generally wants some read/write flash for its var store.  Can we make
>> everything else read-only?
>> We can improve the device models to let us set up a suitable part of the
>> pflash memory read-only.  This is how real hardware works.  Our PC
>> machines currently approximate this with *two* flash chips, one
>> read-only, one read/write, which I consider a mistake that should not be
>> allowed to spread to other machines.
>> Prior discussions
>>     Message-ID: <address@hidden>
>>     https://lists.nongnu.org/archive/html/qemu-devel/2019-02/msg05056.html
>> and
>>     Message-ID: <address@hidden>
>>     https://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg06606.html
>> [*] If you run KSM (kernel same-page merging), the kernel should set up
>> the sharing for you automatically.  But not everybody wants to run KSM.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]