qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for a


From: Peter Maydell
Subject: Re: [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for aarch64
Date: Tue, 14 May 2013 13:25:23 +0100

On 14 May 2013 13:01, Claudio Fontana <address@hidden> wrote:
> On 13.05.2013 20:28, Peter Maydell wrote:
>> On 13 May 2013 14:33, Claudio Fontana <address@hidden> wrote:
>>>
>>> add preliminary support for TCG target aarch64.
>>
>> Thanks for this patch. Some comments below.
>>
>>> Signed-off-by: Claudio Fontana <address@hidden>
>>> ---
>>>  include/exec/exec-all.h  |    5 +-
>>>  tcg/aarch64/tcg-target.c | 1084 
>>> ++++++++++++++++++++++++++++++++++++++++++++++

Incidentally a 1000 line patch is pretty tedious to review
(even if most of it is ok then you have to go back over
the whole thing again when some small part changes), so it
might be worth splitting it up a little if there's a
reasonable split possible.

>> This list seems to not have all the registers in it.
>> You can put the registers used for AREG0 and the temp
>> reg in here -- TCG will correctly not use them because
>> (a) AREG0 is allocated as a fixed register and (b)
>> the temp is put in the reserved-regs list in tcg_target_init.
>>
>> It should be OK to use X16 and X17 as well, right?
>
> I see, I can add AREG0 (X19) and temp (X8) to the list then.
>
> I got cold feet about using X16 and X17 when I
> experienced their use by possibly libgthread and other
> system libraries, and due to their definitions as IP0
> and IP1 ("can be used by call veneers and PLT code").

If they can be used as call veneers then it must
be safe to use them inside a function (as callee
saves registers) because our caller can't be
expecting them to be preserved.

> But if you are sure they are safe to use I can add
> them to the set as temporary registers.

> I skipped X18 because of its definition as the
> "platform register".
> If you think that's a groundless fear, I can add that
> to the list as well.

I guess for cross-OS portability we should avoid it
(and so you should add it to the reserved_regs set
in tcg_target_init()). I don't think it matters
whether it appears in this array or not if it's
reserved. tcg/arm puts SP in the reg-alloc-order
array, for example, even though if we ever used it
we'd die horribly.

>>> +#ifdef CONFIG_SOFTMMU
>>> +    mem_index = args[2];
>>> +    s_bits = opc & 3;
>>> +
>>> +    /* Should generate something like the following:
>>> +     *  shr x8, addr_reg, #TARGET_PAGE_BITS
>>> +     *  and x0, x8, #(CPU_TLB_SIZE - 1)   @ Assumption: CPU_TLB_BITS <= 8
>>> +     *  add x0, env, x0 lsl #CPU_TLB_ENTRY_BITS
>>> +     */
>>
>> The comment says this, but you don't actually seem to have
>> the code to do it?
>>
>> And there definitely needs to be a test somewhere in
>> your generated code for "did the TLB hit or miss?"
>
> Yes, it's basically a TODO comment, adapted from the ARM TCG target.
> Right now we always go through the C helpers, which is of course slower.

In the ARM TCG target case, the comment is describing
what the following code actually generates (and the
bit which isn't implemented is marked as "not implemented
yet"). In your case the comment is neither describing
what you actually emit nor what you ought in an ideal
world to emit.

>>> +    case INDEX_op_rotl_i64: ext = 1;
>>> +    case INDEX_op_rotl_i32:     /* same as rotate right by (32 - m) */
>>> +        if (const_args[2])      /* ROR / EXTR Wd, Wm, Wm, 32 - m */
>>> +            tcg_out_rotl(s, ext, args[0], args[1], args[2]);
>>> +        else { /* no RSB in aarch64 unfortunately. */
>>> +            /* XXX UNTESTED */
>>> +            tcg_out_movi32(s, ext, TCG_REG_X8, ext ? 64 : 32);
>>> +            tcg_out_arith(s, ARITH_SUB, ext, TCG_REG_X8, TCG_REG_X8, 
>>> args[2]);
>>> +            tcg_out_shiftrot_reg(s, SRR_ROR, ext, args[0], args[1], 
>>> TCG_REG_X8);
>>
>> I think you should either test this, or remove it [rot
>> support is optional so you could put it back in a later
>> patch].
>
> Yes, I agree. I could not find an image which triggered that
> code path for register rotation amounts.

Try PPC : rlwmn will generate a rotl (as will other insns).

>> tcg_set_frame() should be called in the prologue generation
>> function, not here. Also, please don't use temp_buf, it is
>> going to go away shortly, as per this patch:
>>  http://lists.gnu.org/archive/html/qemu-devel/2013-04/msg03859.html
>>
>
> I understand that you want to get rid of temp_buf;
> that would mean if I understand correctly using the stack
> for that end; I feel a bit uneasy about the mechanics though,
> and the stability of the result:
> could we add this to the TODO list for a successive change,
> so that we can have a working version now and get a successive
> version to a whole new round of testing?

Sorry, no. At the moment no in-tree TCG target uses temp_buf,
and it's very likely that a patch to remove it will land
before your TCG target is added, at which point your code
will no longer compile. (This is also part of a general
principle where we don't let progressive code cleanups
go 'backwards' by admitting new code which still uses the
old deprecated mechanisms.)

>>> +{
>>> +#if QEMU_GNUC_PREREQ(4, 1)
>>> +    __builtin___clear_cache((char *)start, (char *)stop);
>>> +#else
>>> +    /* XXX should provide alternative with IC <ic_op>, Xt */
>>> +#error "need GNUC >= 4.1, alternative not implemented yet."
>>> +#endif
>>
>> I think we can just assume a GCC new enough to support
>> __builtin___clear_cache(). Nobody's going to be compiling
>> aarch64 code with a gcc that old, because they didn't
>> support the architecture at all. You can drop the #if/#else
>> completely.
>
> Ok. Should we not support non-GCC compilers though?

We do (clang, for instance) -- they just have to support
the same set of builtins as gcc.

> It should not be terrible to emit the IC instruction,

I really don't like handwritten assembly if it can
possibly be avoided -- it's a maintenance pain. This
is exactly what the gcc builtins are for, and we
use them fairly extensively. The 32 bit ARM code
only has the hand-written version because it predates
gcc versions which support this particular builtin.

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]