qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 12/28] target-xtensa: implement shifts (ST1 and RS


From: Richard Henderson
Subject: Re: [Qemu-devel] [RFC 12/28] target-xtensa: implement shifts (ST1 and RST1 groups)
Date: Wed, 04 May 2011 12:07:37 -0700
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc14 Thunderbird/3.1.10

On 05/04/2011 09:39 AM, Max Filippov wrote:
> To track immediate values written to SAR? You mean that there may be
> some performance difference of fixed size shift vs indirect shift and
> TCG is able to tell them apart?

Well, not really fixed vs indirect, but if you know that the value
in the SAR register is in the right range, you can avoid using a
64-bit shift.

For instance,

        SSL     ar2
        SLL     ar0, ar1

could be implemented with

        tcg_gen_sll_i32(ar0, ar1, ar2);

assuming we have enough context.

Let us decompose the SAR register into two parts, storing both the
true value, and 32-value.

    struct DisasContext {
        // Current Stuff
        // ...

        // When valid, holds 32-SAR.
        TCGv sar_m32;
        bool sar_m32_alloc;
        bool sar_m32_valid;
        bool sar_5bit;
    };

At the beginning of the TB:

        TCGV_UNUSED_I32(dc->sar_m32);
        dc->sar_m32_alloc = false;
        dc->sar_m32_valid = false;
        dc->sar_5bit = false;



static void gen_set_sra_m32(DisasContext *dc, TCGv val)
{
    if (!dc->sar_m32_alloc) {
        dc->sar_m32_alloc = true;
        dc->sar_m32 = tcg_temp_local_new_i32();
    }
    dc->sar_m32_valid = true;

    /* Clear 5 bit because the SAR value could be 32.  */
    dc->sar_5bit = false;

    tcg_gen_movi_i32(cpu_SR[SAR], 32);
    tcg_gen_sub_i32(cpu_SR[SAR], cpu_SR[SAR], val);
    tcg_gen_mov_i32(dc->sar_m32, val);
}

static void gen_set_sra(DisasContext *dc, TCGv val, bool is_5bit)
{
    if (dc->sar_m32_alloc && dc->sar_m32_valid) {
        tcg_gen_discard_i32(dc->sar_m32);
    }
    dc->sar_m32_valid = false;
    dc->sar_5bit = is_5bit;

    tcg_gen_mov_i32(cpu_SR[SAR], val);
}

        /* SSL */
        tcg_gen_andi_i32(tmp, cpu_R[AS], 31);
        gen_set_sra_m32(dc, tmp);
        break;

        /* SRL */
        tcg_gen_andi_i32(tmp, cpu_R[AS], 31);
        gen_set_sra(dc, tmp, true);
        break;

        /* WSR.SAR */
        tcg_gen_andi_i32(tmp, cpu_R[AS], 63);
        gen_set_sra(dc, tmp, false);
        break;

        /* SSAI */
        tcg_gen_movi_i32(tmp, constant);
        gen_gen_sra(dc, tmp, true);
        break;

        /* SLL */
        if (dc->sar_m32_valid) {
            tcg_gen_sll_i32(cpu_R[AR], cpu_R[AS], dc->sar_m32);
        } else {
            /* your existing 64-bit shift emulation.  */
        }
        break;

        /* SRL */
        if (dc->sar_5bit) {
            tcg_gen_srl_i32(cpu_R[AR], cpu_R[AS], cpu_SR[SAR]);
        } else {
            /* your existing 64-bit shift emulation.  */
        }


A couple of points: The use of the local temp avoids problems with
intervening insns that might generate branch opcodes.  For the
simplest cases, as with the case at the start of the message, we
ought to be able to propagate the values into the TCG shift insn
directly.

Does that make sense?


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]