[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 2/4] tcg-hppa: Finish the port.
From: |
Richard Henderson |
Subject: |
Re: [Qemu-devel] [PATCH 2/4] tcg-hppa: Finish the port. |
Date: |
Thu, 08 Apr 2010 09:32:41 -0700 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100330 Fedora/3.0.4-1.fc12 Thunderbird/3.0.4 |
On 04/08/2010 02:56 AM, Aurelien Jarno wrote:
> I have applied the patch. I have some comments though, it would be nice
> if you can address them with additional patches.
Sure.
>> +static void tcg_out_ori(TCGContext *s, int ret, int arg, tcg_target_ulong m)
>> +{
>> + if (m == 0) {
>> + tcg_out_mov(s, ret, arg);
>> + } else if (m == -1) {
>> + tcg_out_movi(s, TCG_TYPE_I32, ret, -1);
>
> Those cases are already eliminated in tcg/tcg-op.h. This code looks
> redundant.
The cases eliminated in tcg-op.h are with immediate constants.
There is no generic code in tcg.c to eliminate these cases
after constant propagation. However, I can remove them with...
>> + } else {
>> + tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R1, m);
>> + tcg_out_arith(s, ret, arg, TCG_REG_R1, INSN_OR);
>
> Do we really want a movi here? It would be better to leave the tcg code
> load the constant itself, so that if the same constant is used twice, it
> is only loaded once.
I've never caught TCG properly re-using constants, but I take your
point -- there's no reason why tcg.c can't be improved, and this
port would miss out on that improvement. I'll invent a constraint
that matches or_mask_p.
>> +static void tcg_out_andi(TCGContext *s, int ret, int arg, tcg_target_ulong
>> m)
...
>> + tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R1, m);
>> + tcg_out_arith(s, ret, arg, TCG_REG_R1, INSN_AND);
>
> Same.
ANDI slightly different case. This function is used by tcg_out_tlb_read
with constants that may or may not satisfy and_mask_p. I think it's cleaner
to handle the arbitrary case here, rather than open code the same test in
the tlb read function.
I will of course add a constraint to match and_mask_p, for ANDs that
originate within the opcode stream.
>> + tcg_out_reloc(s, s->code_ptr, R_PARISC_PCREL17F, label_index, 0);
>> + tcg_out32(s, op);
>
> This breaks partial retranslation. The bits corresponding to the offset
> should be preserved.
I don't recall ever hearing about re-translation. Can you point me
at the bits that do it, so I can figure out what's going on? This
sounds like something that ought to be documented properly...
I rather assumed that the "addend" parameter to patch_reloc would
hold whatever is really needed to be preserved. What else is that
field for, anyway?
>> - if (opc == 3)
>> - data_reg2 = *args++;
>> - else
>> - data_reg2 = 0; /* suppress warning */
>> + data_reg2 = (opc == 3 ? *args++ : TCG_REG_R0);
>
> I am not sure TCG_REG_R0 is really correct here, and I find it confusing.
> While it's value is zero, the assignment there is just to make GCC
> happy, it won't be used after
Correct. I don't see what else I can really do though. I think it's
worse to mix types: integer-as-register-number (i.e. *args) and
integer-as-filler (i.e. 0). Better to at least have them be the same
type as it clarifies that *args must be a register number.
Perhaps just a comment here?
>> #if defined(CONFIG_SOFTMMU)
>> - tcg_out_mov(s, r1, addr_reg);
>> + lab1 = gen_new_label();
>> + lab2 = gen_new_label();
>
> Do you really want to use label here? load/store are the most common
> instructions, I am not really sure of the resulting performance.
I think the code is *so* much more readable re-using the usual branch
and relocate code. I'd almost rather spend the time speeding up the
use of temporary labels than uglifying the code here.
>> + /* These three correspond exactly to the fallback implementation.
>> + But by including them we reduce the number of TCG ops that
>> + need to be generated, and these opcodes are fairly common. */
>
> Are you sure it really makes a difference?
Not quantifiably, but the reasoning is sound. I can remove them if you insist.
>> + { INDEX_op_add_i32, { "r", "rZ", "ri" } },
>> + { INDEX_op_sub_i32, { "r", "rI", "ri" } },
>> + { INDEX_op_and_i32, { "r", "rZ", "ri" } },
>> + { INDEX_op_or_i32, { "r", "rZ", "ri" } },
>
> Already commented for "and" and "or", but the same apply for add and
> sub. Do we really need a "i" contraints here if the constant is going
> to be loaded with a movi.
ADD and SUB are not going to use movi. They will use one or both of
ADDIL (21-bit constant << 11) and LDO (14-bit constant). As a pair
these insns can perform a full 32-bit constant addition.
I suppose technically there's a subset of 32-bit constants that could
benefit from generic code loading constants into registers. The only
valid output register for ADDIL is R1. So at the moment for
R3 = R4 + 0x10000;
we generate
addil 0x10000, r4, r1
copy r1, r3
where we could equivalently generate
ldil 0x10000, r5
add r4, r5, r3
However I don't think this is worth worrying about in the short term.
>> + if (GUEST_BASE != 0) {
>> + tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, GUEST_BASE);
>> + }
>
> The final GUEST_BASE value is computed after the prologue has been
> generated. The value is modified in two cases:
> - The user specify a non-aligned base address.
> - /proc/sys/vm/mmap_min_addr is different than 0, which is now the
> in default configuration for more than one year.
>
> When it happens, the guest crashes almost immediately.
To be fair, mmap_min_addr only affects GUEST_BASE if the executable
image we've loaded overlaps. Which is uncommon, but certainly possible.
Hmm. I wonder which is better: one extra instruction needed per qemu_ld
vs having one more call-saved register available. At the moment we don't
even come close to using all of the call-saved registers, and it would be
easy enough to have the prologue read the actual guest_base variable rather
than embed the constant.
r~
- [Qemu-devel] [PATCH 1/4] tcg-hppa: Compute is_write in cpu_signal_handler., (continued)
- [Qemu-devel] [PATCH 1/4] tcg-hppa: Compute is_write in cpu_signal_handler., Richard Henderson, 2010/04/07
- [Qemu-devel] [PATCH 4/4] tcg-hppa: Don't try to calls to non-constant addresses., Richard Henderson, 2010/04/07
- [Qemu-devel] [PATCH 0/4] tcg-hppa finish, v4, Richard Henderson, 2010/04/07
- [Qemu-devel] [PATCH 3/4] tcg-hppa: Fix in/out register overlap in add2/sub2., Richard Henderson, 2010/04/07
- [Qemu-devel] [PATCH 2/4] tcg-hppa: Finish the port., Richard Henderson, 2010/04/07