[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for a
From: |
Claudio Fontana |
Subject: |
Re: [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for aarch64 |
Date: |
Tue, 14 May 2013 16:05:43 +0200 |
User-agent: |
Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 |
On 13.05.2013 21:49, Richard Henderson wrote:
> On 05/13/2013 06:33 AM, Claudio Fontana wrote:
>> +enum aarch64_cond_code {
>> + COND_EQ = 0x0,
>> + COND_NE = 0x1,
>> + COND_CS = 0x2, /* Unsigned greater or equal */
>> + COND_HS = 0x2, /* ALIAS greater or equal */
>
> Clearer to define aliases as COND_HS = COND_CS.
I agree, will change.
>> +static inline void tcg_out_movi64(TCGContext *s, int rd, uint64_t value)
>> +{
>> + uint32_t half, base, movk = 0, shift = 0;
>> + if (!value) {
>> + tcg_out_movr(s, 1, rd, TCG_REG_XZR);
>> + return;
>> + }
>> + /* construct halfwords of the immediate with MOVZ with LSL */
>> + /* using MOVZ 0x52800000 | extended reg.. */
>> + base = 0xd2800000;
>> +
>> + while (value) {
>> + half = value & 0xffff;
>> + if (half) {
>> + /* Op can be MOVZ or MOVK */
>> + tcg_out32(s, base | movk | shift | half << 5 | rd);
>> + if (!movk)
>> + movk = 0x20000000; /* morph next MOVZs into MOVKs */
>> + }
>> + value >>= 16;
>> + shift += 0x00200000;
>
> You'll almost certainly want to try ADP+ADD before decomposing into 3-4
> mov[zk]
> instructions.
I don't know the ADP instruction; but this movi can be improved.
Right now it needs 1 to 4 mov[zk] instructions, depending on the value:
it depends on how many 0000h 16 bit holes there are in the 64bit value.
I was thinking a successive patch to experiment with this.
>
>> +void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
>> +{
>> + tcg_target_long target, offset;
>> + target = (tcg_target_long)addr;
>> + offset = (target - (tcg_target_long)jmp_addr) / 4;
>> +
>> + if (offset <= -0x02000000 || offset >= 0x02000000) {
>> + /* out of 26bit range */
>> + tcg_abort();
>> + }
>
> See MAX_CODE_GEN_BUFFER_SIZE in translate-all.c. Set this value to 128MB and
> then all cross-TB branches will be in range, and the abort won't trigger.
That's great, I missed that. I will add a change to that end in translate-all.c
.
>
>> +static inline void tcg_out_goto_label_cond(TCGContext *s, TCGCond c, int
>> label_index)
>> +{
>> + tcg_target_long offset;
>> + /* backward conditional jump never seems to happen in practice,
>> + so just always use the branch trampoline */
>> + c = tcg_invert_cond(c);
>> + offset = 2; /* skip current instr and the next */
>> + tcg_out_goto_cond(s, c, offset);
>> + tcg_out_goto_label(s, label_index); /* emit 26bit jump */
>> +}
>
> Conditional branch range is +-1MB. You'll never see a TB that large. You
> don't need to emit a branch-across-branch.
Is there maybe a way to do it right even in the corner case where we have
a huge list of hundreds of thousands of instructions without jumps and then a
conditional jump?
Are we _guaranteed_ to never see that large a TB with some kind of define,
similarly to MAX_CODE_GEN_BUFFER_SIZE?
I know, it's a corner case that would only trigger in a very strange program,
but
I would propose to add this topic to the TODO list for successive patches.
>
>> + /* Should generate something like the following:
>> + * shr x8, addr_reg, #TARGET_PAGE_BITS
>> + * and x0, x8, #(CPU_TLB_SIZE - 1) @ Assumption: CPU_TLB_BITS <= 8
>> + * add x0, env, x0 lsl #CPU_TLB_ENTRY_BITS
>> + */
>> +# if CPU_TLB_BITS > 8
>> +# error "CPU_TLB_BITS too large"
>> +# endif
>
> I wonder if using UBFM to extract the TLB bits and BFM with XZR to clear the
> middle bits wouldn't be better, as you wouldn't be restricted on the size of
> CPU_TLB_BITS. AFAICS it would be the same number of instructions.
Hmm..
>
>> + case INDEX_op_mov_i64: ext = 1;
>> + case INDEX_op_mov_i32:
>> + tcg_out_movr(s, ext, args[0], args[1]);
>> + break;
>
> See how the i386 backend uses macros to reduce the typing with these sorts of
> paired opcodes.
I saw the glue thing, and I try to stay away from that kind of preprocessor use,
as it makes it more difficult for newcomers to dig in, since it breaks gid /
grepping
for symbols among other things.
I used instead the editor and regexps to generate the list.
Maybe this could be patched up later if that seems to be the consensus?
>
>> + case INDEX_op_rotl_i64: ext = 1;
>> + case INDEX_op_rotl_i32: /* same as rotate right by (32 - m) */
>> + if (const_args[2]) /* ROR / EXTR Wd, Wm, Wm, 32 - m */
>> + tcg_out_rotl(s, ext, args[0], args[1], args[2]);
>> + else { /* no RSB in aarch64 unfortunately. */
>> + /* XXX UNTESTED */
>> + tcg_out_movi32(s, ext, TCG_REG_X8, ext ? 64 : 32);
>
> But A64 does have shift counts that truncate to the width of the operation.
> Which means that the high bits may contain garbage, which means that you can
> compute this merely as ROR = -ROL, ignoring the 32/64.
I see.
>
>> + case INDEX_op_setcond_i64: ext = 1;
>> + case INDEX_op_setcond_i32:
>> + tcg_out_movi32(s, ext, TCG_REG_X8, 0x01);
>> + tcg_out_cmp(s, ext, args[1], args[2]);
>> + tcg_out_csel(s, ext, args[0], TCG_REG_X8, TCG_REG_XZR,
>> + tcg_cond_to_aarch64_cond[args[3]]);
>
> See CSINC Wd,Wzr,Wzr,cond. No need for the initial movi.
Yes, that was mentioned also by Peter, will change.
>
>> + tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0xffff);
>> + tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0, 0xffff);
>
> Only half of your registers are marked available.
Oops, will fix. Thanks,
--
Claudio Fontana
- [Qemu-devel] [PATCH 2/3] include/elf.h: add aarch64 ELF machine and relocs, (continued)
- [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for aarch64, Claudio Fontana, 2013/05/13
- Re: [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for aarch64, Peter Maydell, 2013/05/13
- Re: [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for aarch64, Claudio Fontana, 2013/05/14
- Re: [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for aarch64, Peter Maydell, 2013/05/14
- Re: [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for aarch64, Richard Henderson, 2013/05/14
- Re: [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for aarch64, Laurent Desnogues, 2013/05/14
- Re: [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for aarch64, Richard Henderson, 2013/05/13
- Re: [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for aarch64,
Claudio Fontana <=
- Re: [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for aarch64, Richard Henderson, 2013/05/14
- Re: [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for aarch64, Richard Henderson, 2013/05/14
- [Qemu-devel] [PATCH 1/3] configure: permit compilation on arm aarch64, Claudio Fontana, 2013/05/13
- Re: [Qemu-devel] [PATCH 1/3] configure: permit compilation on arm aarch64, Peter Maydell, 2013/05/13
- Re: [Qemu-devel] [PATCH 1/3] configure: permit compilation on arm aarch64, Claudio Fontana, 2013/05/14
Re: [Qemu-devel] QEMU aarch64 TCG target - testing question about x86-64, Peter Maydell, 2013/05/06