qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] QEMU/NEMU boot time with several x86 firmwares


From: Maran Wilson
Subject: Re: [Qemu-devel] QEMU/NEMU boot time with several x86 firmwares
Date: Thu, 6 Dec 2018 06:47:54 -0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 12/6/2018 2:38 AM, Stefan Hajnoczi wrote:
On Wed, Dec 05, 2018 at 10:04:36AM -0800, Maran Wilson wrote:
On 12/5/2018 5:20 AM, Stefan Hajnoczi wrote:
On Tue, Dec 04, 2018 at 02:44:33PM -0800, Maran Wilson wrote:
On 12/3/2018 8:35 AM, Stefano Garzarella wrote:
On Mon, Dec 3, 2018 at 4:44 PM Rob Bradford <address@hidden> wrote:
Hi Stefano, thanks for capturing all these numbers,

On Mon, 2018-12-03 at 15:27 +0100, Stefano Garzarella wrote:
Hi Rob,
I continued to investigate the boot time, and as you suggested I
looked also at qemu-lite 2.11.2
(https://github.com/kata-containers/qemu) and NEMU "virt" machine. I
did the following tests using the Kata kernel configuration
(
https://github.com/kata-containers/packaging/blob/master/kernel/configs/x86_64_kata_kvm_4.14.x
)

To compare the results with qemu-lite direct kernel load, I added
another tracepoint:
- linux_start_kernel: first entry of the Linux kernel
(start_kernel())

Great, do you have a set of patches available that all these trace
points. It would be great for reproduction.
For sure! I'm attaching a set of patches for qboot, seabios, ovmf,
nemu/qemu/qemu-lite and linux 4.14 whit the tracepoints.
I'm also sharing a python script that I'm using with perf to extract
the numbers in this way:

$ perf record -a -e kvm:kvm_entry -e kvm:kvm_pio -e
sched:sched_process_exec -o /tmp/qemu_perf.data &
$ # start qemu/nemu multiple times
$ killall perf
$ perf script -s qemu-perf-script.py -i /tmp/qemu_perf.data

As you can see, NEMU is faster to jump to the kernel
(linux_start_kernel) than qemu-lite when uses qboot or seabios with
virt support, but the time to the user space is strangely high, maybe
the kernel configuration that I used is not the best one.
Do you suggest another kernel configuration?

This looks very bad. This isn't the kernel configuration we normally
test with in our automated test system but is definitely one we support
as part of our partnernship with the Kata team. It's a high priority
for me to try and investigate that. Have you saved the kernel messages
as they might be helpful?
Yes, I'm attaching the dmesg output with nemu and qemu.

Anyway, I obtained the best boot time with qemu-lite and direct
kernel
load (vmlinux ELF image). I think because the kernel was not
compressed. Indeed, looking to the others test, the kernel
decompression (bzImage) takes about 80 ms (linux_start_kernel -
linux_start_boot). (I'll investigate better)

Yup being able to load an uncompressed kernel is one of the big
advantages of qemu-lite. I wonder if we could bring that feature into
qemu itself to supplement the existing firmware based kernel loading.
I think so, I'll try to understand if we can merge the qemu-lite
direct kernel loading in qemu.
An attempt was made a long time ago to push the qemu-lite stuff (from the
Intel Clear Containers project) upstream. As I understand it, the main
stumbling block that seemed to derail the effort was that it involved adding
Linux OS specific code to Qemu so that Qemu could do things like create and
populate the zero page that Linux expects when entering startup_64().

That ends up being a lot of very low-level, operating specific knowledge
about Linux that ends up getting baked into Qemu code. And understandably, a
number of folks saw problems with going down a path like that.

Since then, we have put together an alternative solution that would allow
Qemu to boot an uncompressed Linux binary via the x86/HVM direct boot ABI
(https://xenbits.xen.org/docs/unstable/misc/pvh.html). The solution involves
first making changes to both the ABI as well as Linux, and then updating
Qemu to take advantage of the updated ABI which is already supported by both
Linux and Free BSD for booting VMs. As such, Qemu can remain OS agnostic,
and just be programmed to the published ABI.

The canonical definition for the HVM direct boot ABI is in the Xen tree and
we needed to make some minor changes to the ABI definition to allow KVM
guests to also use the same structure and entry point. Those changes were
accepted to the Xen tree already:
https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg00057.html

The corresponding Linux changes that would allow KVM guests to be booted via
this PVH entry point have already been posted and reviewed:
https://lkml.org/lkml/2018/4/16/1002

The final part is the set of Qemu changes to take advantage of the above and
boot a KVM guest via an uncompressed kernel binary using the entry point
defined by the ABI. Liam Merwick will be posting some RFC patches very soon
to allow this.
Cool, thanks for doing this work!

How do the boot times compare to qemu-lite and Firecracker's
(https://github.com/firecracker-microvm/firecracker/) direct vmlinux ELF
boot?
Boot times compare very favorably to qemu-lite, since the end result is
basically doing a very similar thing. For now, we are going with a QEMU +
qboot solution to introduce the PVH entry support in Qemu (meaning we will
be posting Qemu and qboot patches and you will need both to boot an
uncompressed kernel binary). As such we have numbers that Liam will include
in the cover letter showing significant boot time improvement over existing
QEMU + qboot approaches involving a compressed kernel binary. And as we all
know, the existing qboot approach already gets boot times down pretty low.
The first email in this thread contains benchmark results showing that
optimized SeaBIOS is comparable to qboot, so it does not offer anything
unique with respect to boot time.
To be fair, what I'm saying is that the qboot + PVH approach saves a 
significant percentage of boot time as compared to qboot only. So it 
does provide an important improvement over both existing qboot as well 
as optimized SeaBIOS from what I can tell. Please see:
http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg00957.html
and
http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg00953.html

We're trying to focus on SeaBIOS because it's actively maintained and
already shipped by distros.  Relying on qboot will make it harder to get
PVH into the hands of users because distros have to package and ship
qboot first.  This might also require users to change their QEMU
command-line syntax to benefit from fast kernel booting.
But you do make a good point here about distribution and usability. 
Using qboot is just one way to take advantage of the PVH entry -- and 
the quickest way for us to get something usable out there for the 
community to look at and play with.
There are other ways to take advantage of the PVH entry for KVM guests, 
once the Linux changes are in place. So qboot is definitely not a hard 
requirement in the long run.
Thanks,
-Maran

I would strongly recommend looking at the SeaBIOS -kernel approach and
avoiding QEMU command-line changes.  That way -kernel becomes fast for
users as soon as they upgrade their QEMU without requiring configuration
changes.

If you have questions about how the -kernel boot works with SeaBIOS,
Stefano can help explain it and share details of his
development/benchmarking environment (see also earlier mails in this
email thread).

Once the patches have been posted (soon) it would be great if some other
folks could pick them up and run your own numbers on various test setups and
comparisons you already have.

I haven't tried Firecracker, specifically. It would be good to see a
comparison just so we know where we stand, but it's not terribly relevant to
folks who want to continue using Qemu right? Meaning Qemu (and all solutions
built on it like kata) still needs a solution for improving boot time
regardless of what NEMU and Firecracker are doing.
Right.  Collaboration with Firecracker is more in the interest of
avoiding duplication and making it easy for users to fast boot a single
kernel (thanks to a common ABI like PVH) on any hypervisor.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]