qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: Luvalley project: enable Qemu to utilize hardware virtu


From: Xiaodong Yi
Subject: [Qemu-devel] Re: Luvalley project: enable Qemu to utilize hardware virtualization extensions on arbitrary operating system
Date: Fri, 17 Apr 2009 17:12:41 +0800

Hi,

I've tested the guest Linux using UnixBench 5.1.2. The guest Linux
means the Linux running on the virtual machine provided by Luvalley
and Qemu. The testing platform is:
 * Intel's Core Due CPU with 2 cores, 2GB RAM
 * CentOS 5.2 as the dom0 Linux, i.e., the host Linux for KVM
 * CentOS 5.2 as the guest Linux, i.e., the Linux running on the
virtual machine provided by Qemu

The first set of results is for Luvalley, and the second one is for
KVM. As the result, Luvalley's guest Linux is 20% ~ 30% faster than
KVM's guest! It is very surprise to me. I had through Luvalley's guest
should be the same performance as KVM's.




Luvalley's result:

========================================================================
  BYTE UNIX Benchmarks (Version 5.1.2)

  System: localhost.localdomain: GNU/Linux
  OS: GNU/Linux -- 2.6.18-53.el5 -- #1 SMP Mon Nov 12 02:22:48 EST 2007
  Machine: i686 (i386)
  Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
  CPU 0: QEMU Virtual CPU version 0.9.1 (5326.4 bogomips)
         x86-64, MMX, Physical Address Ext
  CPU 1: QEMU Virtual CPU version 0.9.1 (5319.9 bogomips)
         x86-64, MMX, Physical Address Ext
  11:32:22 up 1 min,  1 user,  load average: 2.39, 1.07, 0.39; runlevel 5

------------------------------------------------------------------------
Benchmark Run: 五  4月 17 2009 11:32:22 - 11:44:22
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       10802400.5 lps   (10.0 s, 2 samples)
Double-Precision Whetstone                    12287.7 MWIPS (10.0 s, 2 samples)
Execl Throughput                               1044.6 lps   (29.2 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks        429860.0 KBps  (30.0 s, 1 samples)
File Copy 256 bufsize 500 maxblocks          125357.0 KBps  (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks        990103.0 KBps  (30.0 s, 1 samples)
Pipe Throughput                              803044.2 lps   (10.0 s, 2 samples)
Pipe-based Context Switching                 124785.9 lps   (10.0 s, 2 samples)
Process Creation                               1861.8 lps   (30.0 s, 1 samples)
Shell Scripts (1 concurrent)                   2338.6 lpm   (60.0 s, 1 samples)
Shell Scripts (8 concurrent)                    438.3 lpm   (60.1 s, 1 samples)
System Call Overhead                         709335.4 lps   (10.0 s, 2 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   10802400.5    925.7
Double-Precision Whetstone                       55.0      12287.7   2234.1
Execl Throughput                                 43.0       1044.6    242.9
File Copy 1024 bufsize 2000 maxblocks          3960.0     429860.0   1085.5
File Copy 256 bufsize 500 maxblocks            1655.0     125357.0    757.4
File Copy 4096 bufsize 8000 maxblocks          5800.0     990103.0   1707.1
Pipe Throughput                               12440.0     803044.2    645.5
Pipe-based Context Switching                   4000.0     124785.9    312.0
Process Creation                                126.0       1861.8    147.8
Shell Scripts (1 concurrent)                     42.4       2338.6    551.6
Shell Scripts (8 concurrent)                      6.0        438.3    730.5
System Call Overhead                          15000.0     709335.4    472.9
                                                                  ========
System Benchmarks Index Score                                         631.2

------------------------------------------------------------------------
Benchmark Run: 五  4月 17 2009 11:44:22 - 11:56:05
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       22031241.9 lps   (10.0 s, 2 samples)
Double-Precision Whetstone                    23862.7 MWIPS (10.0 s, 2 samples)
Execl Throughput                               1691.9 lps   (29.8 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks        153766.0 KBps  (30.1 s, 1 samples)
File Copy 256 bufsize 500 maxblocks           45848.0 KBps  (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks        432211.0 KBps  (30.0 s, 1 samples)
Pipe Throughput                             1617636.7 lps   (10.0 s, 2 samples)
Pipe-based Context Switching                 233890.1 lps   (10.0 s, 2 samples)
Process Creation                               3207.9 lps   (30.0 s, 1 samples)
Shell Scripts (1 concurrent)                   3151.4 lpm   (60.0 s, 1 samples)
Shell Scripts (8 concurrent)                    437.8 lpm   (60.2 s, 1 samples)
System Call Overhead                        1386223.1 lps   (10.0 s, 2 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   22031241.9   1887.9
Double-Precision Whetstone                       55.0      23862.7   4338.7
Execl Throughput                                 43.0       1691.9    393.5
File Copy 1024 bufsize 2000 maxblocks          3960.0     153766.0    388.3
File Copy 256 bufsize 500 maxblocks            1655.0      45848.0    277.0
File Copy 4096 bufsize 8000 maxblocks          5800.0     432211.0    745.2
Pipe Throughput                               12440.0    1617636.7   1300.4
Pipe-based Context Switching                   4000.0     233890.1    584.7
Process Creation                                126.0       3207.9    254.6
Shell Scripts (1 concurrent)                     42.4       3151.4    743.3
Shell Scripts (8 concurrent)                      6.0        437.8    729.7
System Call Overhead                          15000.0    1386223.1    924.1
                                                                  ========
System Benchmarks Index Score                                         735.5





KVM's results:

========================================================================
  BYTE UNIX Benchmarks (Version 5.1.2)

  System: localhost.localdomain: GNU/Linux
  OS: GNU/Linux -- 2.6.18-53.el5 -- #1 SMP Mon Nov 12 02:22:48 EST 2007
  Machine: i686 (i386)
  Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
  CPU 0: QEMU Virtual CPU version 0.9.1 (5325.7 bogomips)
         x86-64, MMX, Physical Address Ext
  CPU 1: QEMU Virtual CPU version 0.9.1 (5319.6 bogomips)
         x86-64, MMX, Physical Address Ext
  12:02:30 up 1 min,  1 user,  load average: 2.37, 0.87, 0.31; runlevel 5

------------------------------------------------------------------------
Benchmark Run: 五  4月 17 2009 12:02:30 - 12:11:33
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       10599139.8 lps   (10.0 s, 2 samples)
Double-Precision Whetstone                     2166.3 MWIPS (10.2 s, 2 samples)
Execl Throughput                                598.3 lps   (29.9 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks        458264.0 KBps  (30.0 s, 1 samples)
File Copy 256 bufsize 500 maxblocks          125402.0 KBps  (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks       1122309.0 KBps  (30.0 s, 1 samples)
Pipe Throughput                              811955.6 lps   (10.0 s, 2 samples)
Pipe-based Context Switching                 116759.0 lps   (10.0 s, 2 samples)
Process Creation                               1503.8 lps   (30.0 s, 1 samples)
Shell Scripts (1 concurrent)                   1942.2 lpm   (60.0 s, 1 samples)
Shell Scripts (8 concurrent)                    374.9 lpm   (60.0 s, 1 samples)
System Call Overhead                         712668.8 lps   (10.0 s, 2 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   10599139.8    908.2
Double-Precision Whetstone                       55.0       2166.3    393.9
Execl Throughput                                 43.0        598.3    139.1
File Copy 1024 bufsize 2000 maxblocks          3960.0     458264.0   1157.2
File Copy 256 bufsize 500 maxblocks            1655.0     125402.0    757.7
File Copy 4096 bufsize 8000 maxblocks          5800.0    1122309.0   1935.0
Pipe Throughput                               12440.0     811955.6    652.7
Pipe-based Context Switching                   4000.0     116759.0    291.9
Process Creation                                126.0       1503.8    119.4
Shell Scripts (1 concurrent)                     42.4       1942.2    458.1
Shell Scripts (8 concurrent)                      6.0        374.9    624.9
System Call Overhead                          15000.0     712668.8    475.1
                                                                  ========
System Benchmarks Index Score                                         502.8

------------------------------------------------------------------------
Benchmark Run: 五  4月 17 2009 12:11:33 - 12:20:43
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       21416721.5 lps   (10.0 s, 2 samples)
Double-Precision Whetstone                     4928.6 MWIPS (10.1 s, 2 samples)
Execl Throughput                               1438.2 lps   (29.6 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks        122731.0 KBps  (30.1 s, 1 samples)
File Copy 256 bufsize 500 maxblocks           32222.0 KBps  (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks        308986.0 KBps  (30.0 s, 1 samples)
Pipe Throughput                             1617083.9 lps   (10.0 s, 2 samples)
Pipe-based Context Switching                 230390.0 lps   (10.0 s, 2 samples)
Process Creation                               2373.6 lps   (30.0 s, 1 samples)
Shell Scripts (1 concurrent)                   2732.8 lpm   (60.0 s, 1 samples)
Shell Scripts (8 concurrent)                    373.4 lpm   (60.1 s, 1 samples)
System Call Overhead                        1393848.5 lps   (10.0 s, 2 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   21416721.5   1835.2
Double-Precision Whetstone                       55.0       4928.6    896.1
Execl Throughput                                 43.0       1438.2    334.5
File Copy 1024 bufsize 2000 maxblocks          3960.0     122731.0    309.9
File Copy 256 bufsize 500 maxblocks            1655.0      32222.0    194.7
File Copy 4096 bufsize 8000 maxblocks          5800.0     308986.0    532.7
Pipe Throughput                               12440.0    1617083.9   1299.9
Pipe-based Context Switching                   4000.0     230390.0    576.0
Process Creation                                126.0       2373.6    188.4
Shell Scripts (1 concurrent)                     42.4       2732.8    644.5
Shell Scripts (8 concurrent)                      6.0        373.4    622.4
System Call Overhead                          15000.0    1393848.5    929.2
                                                                  ========
System Benchmarks Index Score                                         558.9



Thanks for your attention and welcome further feedback.

Regards,

Xiaodong Yi


2009/3/26 Xiaodong Yi <address@hidden>:
> Luvalley is a Virtual Machine Monitor (VMM) spawned from the KVM
> project. Its part of source codes are derived from KVM to virtualize
> CPU instructions and memory management unit (MMU). However, its
> overall architecture is completely different from KVM, but somewhat
> like Xen. Luvalley runs outside of Linux, just like Xen's
> architecture, but it still uses Linux as its scheduler, memory
> manager, physical device driver provider and virtual IO device
> emulator. Moreover, Luvalley may run WITHOUT Linux. In theory, any
> operating system could take the place of Linux to provide the above
> services. Currently, Luvalley supports Linux and Windows. That is to
> say, one may run Luvalley to boot a Linux or Windows, and then run
> multiple virtualized operating systems on such Linux or Windows.
>
> In KVM, Qemu is adopted as the IO device emulator. From the point of
> view of Qemu, KVM enables Qemu to utilize hardware virtualization
> extensions such as Intel's VT on Linux. As for Luvalley, Qemu is also
> adopted as its IO device emulator. However, Luvalley could enable Qemu
> to utilize hardware virtualization extensions on ANY operating system.
>
> If you are interested in Luvalley project, you may download Luvalley's
> source codes from
>     http://sourceforge.net/projects/luvalley/
>
> The following is more details about Luvalley.
>
> Luvalley is an external hypervisor, just like Xen
> (http://www.xen.org). It boots and controls the X86 machine before
> starting up any operating system. However, Luvalley is much smaller
> and simpler than Xen. Most jobs of Xen, such as scheduling, memory
> management, interrupt management, etc, are shifted to Linux (or any
> other OS), which is running on top of Luvalley.
>
> Luvalley gets booted first when the X86 machine is power on. It boots
> up all CPUs in SMP system and enables their virtualization extensions.
> Then the MBR (Master Boot Record) is read out and executed in CPU's
> virtualization mode. Following this way, a Linux (or any other OS)
> will be booted up at last. Luvalley assigns all physical memory,
> programmable interrupt controller (PIC) and IO devices to this
> priviledged OS. Following Xen, we call this OS as "domain 0" (dom0)
> OS.
>
> Like KVM, a modified Qemu is running on dom0 Linux to provide virtual
> IO devices for other operating systems running on top of Luvalley. We
> also follow Xen to call these operating systems "domain user" (domU).
> That is to say, there must be exact one dom0 OS and may be several
> domU OSs running on top of Luvalley. Each domU OS corresponds to a
> Qemu process in dom0 OS. The memory of domU is allocated from dom0 by
> Qemu. And when Qemu is scheduled to run by dom0 Scheduler, it will
> call Luvalley to run the corresponding domU.
>
> Moreover, as Luvalley requires nothing from the dom0 Linux kernel,
> other operating systems such as Windows, FreeBSD, etc can also serve
> as dom0 OS, provided that Qemu can be ported to these operating
> systems. Since Qemu is an userland application and is able to cross
> platform, such porting is feasible. Currently, we have added the
> Luvalley support into Qemu-0.10.0, which can be compiled and run in
> Windows. With the help of Luvalley, Qemu-0.10.0 runs much faster
> becuase it could utilize the VT support provided by Intel CPU.
>
> In summary, Luvalley inherited all merits from KVM. Especially,
> Luvalley is very small and simple. It is even more easy-to-use than
> KVM because it does not depend on specific Linux kernel version. Every
> version of Linux can serve as Luvalley's dom0 OS, except that Qemu
> cannot run on it.
>
> In addition, we think Luvalley's architecture meets the demand on both
> desktop and server operating system area:
>
> 1. In the desktop area, there are many kinds of operating systems
> runing on various hardwares and devices. In theory, it is rather easy
> to add virtualization ability for all kinds of operating systems,
> without sacrificing the hardware compatibility and the user
> experience. Moreover, Luvalley is very easy to install. It requires
> only a boot loader which supports Multiboot Specification, e.g., Grub,
> WinGrub (http://sourceforge.net/projects/grub4dos), etc.
>
> 2. In the server area, especially for large-scale server systems (for
> example, throusands of CPUs), a single Linux is not suitable to manage
> the whole system. Therefore, KVM cannot be used properly. Luvalley's
> architecture is more suitable for servers. For example, it can be used
> to divide physical resources to partitions, and run a Linux for each
> partition. In addition, Luvalley is very small and may be put into
> BIOS to serve as a virtulization firmware.
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]