[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC] Reducing NEED_CPU_H usage
From: |
Richard Henderson |
Subject: |
Re: [RFC] Reducing NEED_CPU_H usage |
Date: |
Thu, 12 Jan 2023 12:40:40 -1000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 |
On 1/12/23 07:28, Alessandro Di Federico wrote:
On Tue, 10 Jan 2023 11:56:50 -0800
Richard Henderson <richard.henderson@linaro.org> wrote:
However, at some point we do want to keep some target addresses in
the proper size. For instance within the softmmu tlb, where
CPUTLBEntry is either 16 or 32 bytes, depending.
So that would be an optimization if `HOST_LONG_BITS == 32 &&
TARGET_LONG_BITS == 32`, correct?
At present, not an 'optimization' but 'natural fallout of type sizes'.
But, yeah.
## `abi_ulong`
Similar to `target_ulong`, but with alignment info.
Pardon? There's no alignment info in abi_ulong.
From include/exec/user/abitypes.h:
typedef uint32_t abi_ulong __attribute__((aligned(ABI_LONG_ALIGNMENT)));
typedef target_ulong abi_ulong
__attribute__((aligned(ABI_LONG_ALIGNMENT)));
I thought that was the difference. Thanks for the clarification.
Ah, I see what you mean. I *believe* that use of target_ulong could actually be uint64_t,
since all TARGET_LONG_BITS == 32 ought to have TARGET_ABI32 set too (which brings us to
that first definition with uint32_t.)
There is a target alignment difference, for the benefit of e.g. m68k, which has
sizeof(long) == 4 and __alignof(long) == 2, which may differ by the host.
In any case, all of "exec/abitypes.h" is (or should be) user-only related, so out of scope
for this project.
This one requires some work within tcg/ to handle two target address
sizes simultaneously. It should not be technically difficult to
solve, but it does involve adding a few TCG opcodes and adjusting all
tcg backends.
I'm a bit confused by this, do backends for some reason have
expectations about the address size?
Wouldn't this be enough?
Yes, this affects the tcg backend as well:
static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
{
...
datalo = *args++;
datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
addrlo = *args++;
addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
oi = *args++;
and further code generation changes especially for 64-bit guest pointer comparisons on
32-bit hosts. (There's that nasty combination again.)
There's also code generation changes for 32-bit guest pointer comparisons on 64-bit hosts,
e.g.
static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
int mem_index, MemOp opc,
tcg_insn_unit **label_ptr, int which)
{
...
TCGType tlbtype = TCG_TYPE_I32;
int trexw = 0, hrexw = 0, tlbrexw = 0;
...
if (TCG_TARGET_REG_BITS == 64) {
if (TARGET_LONG_BITS == 64) {
ttype = TCG_TYPE_I64;
trexw = P_REXW;
}
if (TCG_TYPE_PTR == TCG_TYPE_I64) {
hrexw = P_REXW;
if (TARGET_PAGE_BITS + CPU_TLB_DYN_MAX_BITS > 32) {
tlbtype = TCG_TYPE_I64;
tlbrexw = P_REXW;
}
}
}
...
/* cmp 0(r0), r1 */
tcg_out_modrm_offset(s, OPC_CMP_GvEv + trexw, r1, r0, which);
which generates either 'cmpl' or 'cmpq' depending on the guest address size.
Anyway, there's no need for this.
So, if I got it right, we could just make TCGv become a new opaque
type, propagate it down until the spot where we actually need to know
its size and then just have some `TCGTemp *tcgv_temp(TCGv v)` function
to inspect the actual size?
No, leave TCGv as a macro, but we need changes to "tcg/tcg-op*.h", so that all of the
above tcg backend stuff *also* gets disconnected from TARGET_LONG_BITS.
Makes sense!
Before CPUNegativeOffsetState, we had all of those variables within
CPUState, and even comments that they should remain at the end of the
struct. But those comments were ignored and one day icount_decr was
too far away from env for easy addressing on arm host. Luckily there
was an assert that fired, but it didn't get caught right away.
How comes it wasn't caught immediately?
We could have something like:
QEMU_BUILD_BUG_MSG(&ArchCPU.env - &ArchCPU.neg.tlb
< DESIRED_THRESHOLD)
We probably should.
But it didn't affect x86, and cross-build testing wasn't sufficient, so we didn't find out
later when someone did runtime testing on the affected hosts.
Our current goal is to get the following compilation unit to build
without NEED_CPU_H:
trace/control-target.c
gdbstub/gdbstub.c
cpu.c
disas.c
page-vary.c
tcg/optimize.c
tcg/region.c
tcg/tcg.c
tcg/tcg-common.c
tcg/tcg-op.c
tcg/tcg-op-gvec.c
tcg/tcg-op-vec.c
fpu/softfloat.c
accel/accel-common.c
accel/tcg/tcg-all.c
accel/tcg/cpu-exec-common.c
accel/tcg/cpu-exec.c
accel/tcg/tb-maint.c
accel/tcg/tcg-runtime-gvec.c
accel/tcg/tcg-runtime.c
accel/tcg/translate-all.c
accel/tcg/translator.c
accel/tcg/user-exec.c
accel/tcg/user-exec-stub.c
accel/tcg/plugin-gen.c
plugins/loader.c
plugins/core.c
plugins/api.c
They are subset of `arch_srcs` from `meson.build`.
Reasonable. I think accel/tcg/user-exec.c is the only one that isn't also required for
system mode, and completing accel/tcg/ would be less confusing than not.
r~