qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] How to measure guest memory access (qemu_ld/qemu_st) ti


From: Wei-Ren Chen
Subject: Re: [Qemu-devel] How to measure guest memory access (qemu_ld/qemu_st) time?
Date: Thu, 14 Jun 2012 11:18:26 +0800
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Jun 13, 2012 at 12:43:28PM +0200, Laurent Desnogues wrote:
> On Wed, Jun 13, 2012 at 5:14 AM, 陳韋任 (Wei-Ren Chen)
> <address@hidden> wrote:
> > Hi all,
> >
> >  I suspect that guest memory access (qemu_ld/qemu_st) account for the major 
> > of
> > time spent in system mode. I would like to know precisely how much (if 
> > possible).
> > We use tools like perf [1] before, but since the logic of guest memory 
> > access aslo
> > embedded in the host binary not only helper functions, the result cannot be
> > relied. The current idea is adding helper functions before/after guest 
> > memory
> > access logic. Take ARM guest on x86_64 host for example, should I add the 
> > helper
> > functions before/after tcg_gen_qemu_{ld,st} in target-arm/translate.c or
> > tcg_out_qemu_{ld,st} in tcg/i386/tcg-target.c? Or there is a better way to 
> > know
> > how much time QEMU spend on handling guest memory access?
> 
> I'm afraid there's no easy way to measure that: any change you make
> to generated code will completely change the timing given that the ld/st
> fast path is only a few instructions long.

  Lluis, how's your opinion on that? Does your tracepoints have the same timing
issue, too?

> Another approach might be to run the program in user mode and then
> in system mode (provided the guest OS is very light).

  We ran SPEC2006 test input both in user and system mode (arm guest os). The
result is that system mode is roughly 2x slower than user mode. Not sure if the
result is reasonable. 
 
> As a side note, it might be interesting to gather statistics about the hit
> rate of the QEMU TLB.  Another thing to consider is speeding up the
> fast path;  see YeongKyoon Lee RFC patch:
> 
> http://www.mail-archive.com/address@hidden/msg91294.html

  We have some result on TLB hit rate on the link below,

  
https://docs.google.com/spreadsheet/ccc?key=0Aq_07U3IjpY8dFN6dTczMldtQVRUSk9Qa2ZKZTZEZGc&pli=1#gid=0

Here is how we get the TLB hit rate. We use tcg_out_xxx to insert counting code
in the tcg_out_tlb_load (tcg/i386/tcg-target.c). At the beginning of 
tcg_out_tlb_load,
we count the total guest memory access, and count the tlb hit number at the TLB
hit flow. You can see the code at

  
https://github.com/ZackClown/QEMU_1.0.1/commit/013a9f8e2611e25344bc095a9f72fdfbb0c64d06#diff-3
  

  The reason why we want to do the measuring is we want to use KVM (sounds crazy
idea) MMU virtualization to speedup the guest -> host memory address 
translation.
I talked to some people on LinuxCon Japan, included Paolo, about this idea. The
feedback I got is we can only use shadow page table rather than EPT/NPT to do
the address translation (if possible!) since different ISA (ARM and x86, for
example) have different page table format. Besides, QEMU has to use ioctl to ask
KVM to get the translation result, but it's an overkill as the ARM page table
is quite simple, which can be done in user mode very fast.

  Any comment is welcomed.

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj



reply via email to

[Prev in Thread] Current Thread [Next in Thread]