Re: [Qemu-devel] Improve QEMU performance with LLVM codegen and other te

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Improve QEMU performance with LLVM codegen and other te

From:	Alexander Graf
Subject:	Re: [Qemu-devel] Improve QEMU performance with LLVM codegen and other techniques
Date:	Thu, 1 Dec 2011 11:23:37 +0100

On 01.12.2011, at 04:50, 陳韋任 wrote:

> Hi Alex,
> 
>> Very cool! I was thinking about this for a while myself now. It's especially 
>> appealing these days since you can do the hotspot optimization in a separate 
>> thread :).
>> 
>> Especially in system mode, you also need to flush when tb_flush() is called 
>> though. And you have to make sure to match hflags and segment descriptors 
>> for the links - otherwise you might end up connecting TBs from different 
>> processes :).
> 
>  I'll check the tb_flush again. IIRC, we make the code cache big enough so 
> that
> there is no need to flush the code cache. But I think we still need to deal 
> with
> it in the end.

It is never big enough :). In fact, even a normal system mode guest boot 
triggers tb_flush usually because the cache is full. And target code can also 
trigger it manually.

> The block linking is done by QEMU and we leave it alone. But I don't know QEMU
> ever does hflags and segment descriptors check before doing block linking. 
> Could
> you point it out? Anyway, here is how we form trace from a set of basic 
> blocks.

Sure. Just check for every piece of code that executes cpu_get_tb_cpu_state() 
:).

> 1. We insert instrumented code at the beginning of each TCG block to collect 
> how
>   many times this block being executed.
> 
> 2. When a block's execution time, say block A, reaches a pre-defined 
> threshold,
>   we follow the run time execution path to collect block B followed A and so 
> on
>   to form a trace. This approach is called NET (Next-Executing Tail) [1].
> 
> 3. Then a trace composed of TCG blocks is sent to a LLVM translator. The 
> translator
>   generates the host binary for the trace into a LLVM code cache, and patch 
> the

I don't fully understand this part. Do you disassemble the x86 blob that TCG 
emitted?

>   beginning of block A (in QEMU code cache) so that anyone executing block A 
> will 
>   jump to the corresponding trace and execute.
> 
> Above is block to trace link. I think there is no need to do hflags and 
> segment
> descriptors check, right? Although I set the trace length to one basic block 
> at

If you only take the choices that QEMU has already patched into the TB for you 
then no, you don't need to check it yourself, because QEMU already checked it :)

> the moment (make the situation simpler), I think we still don't have to check
> the blocks' hflags and segment descriptors in the trace to see if they match.

Yeah. You only need to be sync'ed with the invalidation then. And make sure you 
patch the TB atomically, so you don't have a separate thread accidentally run 
half your code and half the old code.

> 
>>> successfully, then login and run some benchmark on it. As a very first 
>>> step, we
>>> make a very high threshold on trace building. In other words, a basic block 
>>> must
>>> be executed *many* time to trigger the trace building process. Then we 
>>> lower the
>>> threshold a bit at a time to see how things work. When something goes 
>>> wrong, we
>>> might get kernel panic or the system hangs at some point on the booting 
>>> process.
>>> I have no idea on how to solve this kind of problem. So I'd like to seek for
>>> help/experience/suggestion on the mailing list. I just hope I make the whole
>>> situation clear to you. 
>> 
>> I don't see any better approach to debugging this than the one you're 
>> already taking. Try to run as many workloads as you can and see if they 
>> break :). Oh and always make the optimization optional, so that you can 
>> narrow it down to it and know you didn't hit a generic QEMU bug.
> 
>  You mean make the trace optimization optional? We have tested our framework 
> in
> LLVM-only mode. which means we replace TCG with LLVM entirely. It's _very_ 
> slow

I was more thinking of making the trace optimization optional as in not 
optimize but do only TCG like it's done today :).

> but works. What the generic QEMU bug is? We use QEMU 0.13 and just rely on its
> emulation part right now. Does recent version fix major bugs in the emulation
> engine?

I don't know - there are always bug fixes in areas all over the code base. But 
I guess the parts you've been touching have been pretty stable. Either way, I 
was really more trying to point out that there could always be bugs in any 
layer, so having the ability to turn off a layer is in general a good idea :).


Alex

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] Improve QEMU performance with LLVM codegen and other techniques, Alexander Graf <=
- Re: [Qemu-devel] Improve QEMU performance with LLVM codegen and other techniques, 陳韋任, 2011/12/04
  - Re: [Qemu-devel] Improve QEMU performance with LLVM codegen and other techniques, Alexander Graf, 2011/12/04
    - Re: [Qemu-devel] Improve QEMU performance with LLVM codegen and other techniques, 陳韋任, 2011/12/05
- Re: [Qemu-devel] Improve QEMU performance with LLVM codegen and other techniques, Peter Maydell, 2011/12/01

Prev by Date: Re: [Qemu-devel] [BUG] [Seabios] PCI 64bit BARs on Win2008 - unable to start the device. (ACPI lacks the _DSM method)
Next by Date: Re: [Qemu-devel] [PATCH 05/18] qdev: provide a path resolution
Previous by thread: [Qemu-devel] Unsubscription Confirmation
Next by thread: Re: [Qemu-devel] Improve QEMU performance with LLVM codegen and other techniques
Index(es):
- Date
- Thread