qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] Porting TCG to alpha platform


From: Richard Henderson
Subject: Re: [Qemu-devel] [PATCH] Porting TCG to alpha platform
Date: Wed, 20 Jan 2010 13:26:45 -0800
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091209 Fedora/3.0-4.fc12 Thunderbird/3.0

On 01/20/2010 09:19 AM, identifier scorpio wrote:
Below i'll append my newly generated patch against stable-0.10,
in case it is mangled, i also put it in the attachment.

For the record, the inline portion was again mangled; the attachment is fine.

in qemu_ld/st, we must use $16,$17,$18 as temporaries,
because pass them as argument to helper functions such as qemu_ld/st_helpers[].

Ah, yes, I forgot about that.

With I/J constraints you don't need this special casing.

I'm not very familiar with I/J constraints and i'll study them later.

I = uint8_t, to be used with the second arithmetic input.
J = 0, to be used with the first arithmetic input (i.e. $31).

+        tcg_out_inst2(s, opc^4,
TMP_REG1, 1);
+    /* record relocation infor */
+        tcg_out_reloc(s,
s->code_ptr, R_ALPHA_REFQUAD, label_index, 0);
+        s->code_ptr += 4;

Bug: You've applied the relocation to the wrong
instruction.
Bug: What's with the "opc^4"?


what did you mean that i "applied the relocation to the wrong
instruction", couldn't i apply relocation to INDEX_op_brcond_i32 operation?

With tcg_out_inst2 you emit the branch instruction, which calls tcg_out32, which increments s->code_ptr. Next you call tcg_out_reloc with the updated s->code_ptr, which means that the relocation gets applied to the instruction *after* the branch. Finally, you increment s->code_ptr *again*, which means that the instruction after the branch is in fact completely uninitialized.

and opc^4 here is used to toggle between OP_BLBC(opcode 0x38) and 
OP_BLBS(opcode 0x3c), ugly code :)

It does beg the question of why you're reversing the sense of the jump at all. Just because the branch is forward doesn't mean its condition should be changed. I think that's definitely a bug. The sense of the jump should be exactly the same, never mind the direction of the jump.

Oh... I see what you're doing here -- you're generating the entire branch instruction in patch_reloc, and you're generating branch over branch here. That's both confusing and unnecessary.

We should do

static void patch_reloc(uint8_t *x_ptr, int type,
                        tcg_target_long value,
                        tcg_target_long addend)
{
    uint32_t *code_ptr = (uint32_t *)x;
    uint32_t insn = *code_ptr;

    value += addend;
    switch (type) {
    case R_ALPHA_BRADDR:
        value -= (long)x_ptr + 4;
        if ((value & 3) || value < -0x400000 || value >= 0x400000) {
            tcg_abort();
        }
        *code_ptr = insn | INSN_DISP21(val >> 2);
        break;

    default:
        tcg_abort();
    }
}

so as to apply the branch address relocation to the existing insn.
Which lets you write

static void tcg_out_br(TCGContext *s, int opc, int ra, int label_index)
{
    TCGLabel *l = &s->labels[label_index];
    tcg_target_long value;

    if (l->has_value) {
        value = l->u.value;
        value -= (long)s->code_ptr + 4;
        if ((value & 3) || value < -0x400000 || value >= 0x400000) {
            tcg_abort();
        }
        value >>= 2;
    } else {
        tcg_out_reloc(s, s->code_ptr, R_ALPHA_BRADDR, label_index, 0);
        value = 0;
    }
    tcg_out_fmt_br(s, opc, ra, value);
}

static void tcg_out_brcond(TCGContext *s, ...)
{
    // Emit comparison into TMP_REG1.

    opc = (cond & 1) ? INSN_BLBC : INSN_BLBS;
    tcg_out_br(s, opc, TMP_REG1, label_index);
}

Isn't that much clearer?

+    tcg_out_mov(s, r1, addr_reg);
+    tcg_out_mov(s, r0, addr_reg);
+
+#if TARGET_LONG_BITS == 32
+    /* if VM is of 32-bit arch, clear higher 32-bit of addr */
+    tcg_out_fmt_opi(s, INSN_ZAPNOT, r0, 0x0f, r0);
+    tcg_out_fmt_opi(s, INSN_ZAPNOT, r1, 0x0f, r1);
+#endif

I suggest creating a

static inline tcg_out_mov_addr(TCGContext *s, int rd, int rs)
{
    if (TARGET_LONG_BITS == 32) {
        tcg_out_fmt_opi(s, INSN_ZAPNOT, rs, 0x0f, rd);
    } else {
        tcg_out_mov(s, rd, rs);
    }
}

and using that throughout qemu_ld/st. That will save some redundant moves you are creating there, as well as removing some conditional compilation. Indeed, I would suggest replacing all of the conditional compilation vs TARGET_LONG_BITS with normal if's.

... Of course, in this particular case, that zapnot is redundant with the INSN_AND that follows, for both R0 and R1.

+    tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, 
(tcg_target_long)qemu_ld_helpers[s_bits]);
+    tcg_out_push(s, addr_reg);
+    //tcg_out_push(s, TCG_REG_26);
+    //tcg_out_push(s, TCG_REG_15);
+    tcg_out_mov(s, TCG_REG_27, TMP_REG1);
+    tcg_out_fmt_jmp(s, INSN_CALL, TCG_REG_26, TMP_REG1, 0);
+    //tcg_out_pop(s, TCG_REG_15);
+    //tcg_out_pop(s, TCG_REG_26);
+    tcg_out_pop(s, addr_reg);

You need not save and restore ADDR_REG; it is not used after the call.
Also, you can load the address into $27 directly and not use the temp.

+    *(uint32_t *)label1_ptr = (uint32_t) \
+        ( INSN_BNE | ( TMP_REG1 << 21 ) | ( val & 0x1fffff));

Frankly, I don't really think it's worth being so convoluted with the branches in here. I know that's the way that the i386 port does it in qemu_ld/st, but I think we should rather pattern it after the i386 brcond2.

I.e. use gen_new_label to create a new label for use within the routine, use a normal call into tcg_out_br to generate the branch, and use tcg_out_label to place the label at the proper place and resolve the forward branch.

It may be be a teeny bit less efficient, but it's far clearer.

+    tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, \
+       offsetof(CPUTLBEntry, addend) - offsetof(CPUTLBEntry, addr_read));
+    tcg_out_fmt_opr(s, INSN_ADDQ, r1, TMP_REG1, r1);
+    tcg_out_fmt_mem(s, INSN_LDQ, TMP_REG1, r1, 0);

You may want to use

  tcg_out_ld(s, TCG_TYPE_I64, TMP_REG1, r1, offsetof ...);

for clarity, and to reuse the tcg_out_op_long improvements.

+#else
+    r0 = addr_reg;
+#endif // endif defined(CONFIG_SOFTMMU)

Missing GUEST_BASE handling, though that won't help your winnt...

+    case INDEX_op_sar_i32:
+        tcg_out_inst4i(s,
OP_SHIFT, args[1], 32, FUNC_SLL, args[1]);
+        tcg_out_inst4i(s,
OP_SHIFT, args[1], 32, FUNC_SRA, args[1]);

That last shift can be combined with the requested shift
via addition. For constant input, this saves an insn; for
register input, the addition can be done in parallel with
the first shift.

i changed to use "addl r, 0, r" here.

Even better.

I think when qemu met x86 divide instructions, it will call helper
functions to simulate them, must i define div_i32/divu_i32/...?

If you want to emulate anything other than x86, yes.


r~




reply via email to

[Prev in Thread] Current Thread [Next in Thread]