qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 38/48] translator: implement 2-pass translation


From: Alex Bennée
Subject: Re: [Qemu-devel] [RFC 38/48] translator: implement 2-pass translation
Date: Tue, 27 Nov 2018 14:48:11 +0000
User-agent: mu4e 1.1.0; emacs 26.1.90

Emilio G. Cota <address@hidden> writes:

> On Mon, Nov 26, 2018 at 15:16:00 +0000, Alex Bennée wrote:
>> Emilio G. Cota <address@hidden> writes:
> (snip)
>> > +    if (tb_trans_cb && first_pass) {
>> > +        qemu_plugin_tb_trans_cb(cpu, plugin_tb);
>> > +        first_pass = false;
>> > +        goto translate;
>> > +    }
>>
>> So the only reason we are doing this two pass tango is to ensure the
>> plugin can insert TCG ops before the actual translation has occurred?
>
> Not only. The idea is to provide plugins with well-defined TBs,
> i.e. the instruction sizes and contents can be queried by the plugin
> before the plugin decides how/where to instrument the TB.

Hmmm, this seems a little to close to internal knowledge of the TCG. Is
the idea that a plugin might make a different decision based on the
number of a particular type of instruction in the translation block?

This seems like it would get broken if we wanted to implement other
types of TranslationBlock (e.g. hot-blocks with multiple exits for the
non-hot case).

That said looking at the examples using it so far it doesn't seem to be
doing more than looking at the total number of instructions for a given
translation block.

> Since in the targets we generate TCG code and also generate
> host code in a single shot (TranslatorOps.translate_insn),
> the 2-pass approach is a workaround to first get the
> well-defined TB, and in the second pass inject the instrumentation
> in the appropriate places.

Richard's suggestion of providing a central translator_ldl_code could
keep the book keeping of each instruction location and contents in the
core translator. With a little tweaking to the TCG we could then insert
our instrumentation at the end of the pass with all the knowledge we
want to export to the plugin.

Inserting instrumentation after instructions have executed will be
trickier though due to reasons Peter mentioned on IRC.

> This is a bit of a waste but given that it only happens at
> translate time, it can have negligible performance impact --
> I measured a 0.13% gmean slowdown for SPEC06int.

I'm less concerned about efficiency as complicating the code, especially
if we are baking in concepts that restrict our freedom to change code
generation around going forward.

>
>> I think we can do better, especially as the internal structures of
>> TCGops are implemented as a list so ops and be inserted before and after
>> other ops. This is currently only done by the optimiser at the moment,
>> see:
>>
>>   TCGOp *tcg_op_insert_before(TCGContext *s, TCGOp *op, TCGOpcode opc, int 
>> narg);
>>   TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *op, TCGOpcode opc, int 
>> narg);
>>
>> and all the base tcg ops end up going to tcg_emit_op which just appends
>> to the tail. But if we can come up with a neater way to track the op
>> used before the current translated expression we could do away with two
>> phases translation completely.
>
> This list of ops is generated via TranslatorOps.translate_insn.
> Unfortunately, this function also defines the guest insns that form the TB.
> Decoupling the two actions ("define the TB" and "translate to TCG ops")
> would be ideal, but I didn't want to rewrite all the target translators
> in QEMU, and opted instead for the 2-pass approach as a compromise.

I don't quite follow. When we've done all our translation into TCG ops
haven't we by definition defined the TB?

Maybe the interface shouldn't be per-insn and per-TB but just an
arbitrary chunk of instructions. We could call the plugin with a list of
instructions with some opaque data that can be passed back to the plugin
APIs to allow insertion of instrumentation at the appropriate points.
The initial implementation would be a single-pass and called after the
TCG op generation. An instruction counter plugin would then be free to
insert counter instrumentation as frequently or infrequently as it
wants. These chunks wouldn't have to be tied to the internals of TCG and
in the worst case we could just inform the plugin in 1 insn chunks
without having to change the API?

What do you think?


--
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]