qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash


From: Aleksandar Markovic
Subject: [Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash
Date: Fri, 11 Jan 2019 13:49:15 +0100

Hello, Peter.

First of all, I want to tell you that I support this series and I salute
efforts in this and related areas. It is known that there have been strong
trends towards asymmetric multi-core systems now for some longish time -
and that QEMU support in that area will greately enhance QEMU itself.

However, being unfamilliar with relevant parts of QEMU internals, allow me
to ask a couple of basic questions:

1. What would be, in more detail, if possible in layman terms, the "bad
case" that this series fixes?

2. Let's suppose, hypothetically, and based on your example from one of
commit messages from this series, that we want to support two multicore
systems:
    A. Cluster 1: 1 core with FPU; cluster 2: 3 cores without FPU
    B. Cluster 1: 2 cores with FPU; cluster 2: 1 core without FPU
Is there an apparatus that would allow the end user specify these and
similar cpnfigurations through command line or acsimilar mean (so, without
QEMU explicitely supporting such core organization, but supporting the
single core in question, of course)? If not, would that be a desirable QEMU
feature?

3. Is there a possibility to have two layer clustering sheme, instead of
one layer? Cluster/subcluster/core instead of cluster/core? For MIPS, there
is a need for such organization. It looks to me 8 bits for cluster id, and
3 bits for subcluster id would be sufficient. In such scheme, the only
change for TCG would be to check both cluster and subcluster id rather than
only cluster id.

Regards,
Aleksandar


On Thursday, January 10, 2019, Peter Maydell <address@hidden>
wrote:

> On Tue, 8 Jan 2019 at 16:30, Peter Maydell <address@hidden>
> wrote:
> >
> > Include the cluster number in the hash we use to look
> > up TBs. This is important because a TB that is valid
> > for one cluster at a given physical address and set
> > of CPU flags is not necessarily valid for another:
> > the two clusters may have different views of physical
> > memory, or may have different CPU features (eg FPU
> > present or absent).
> >
> > We put the cluster number in the high 8 bits of the
> > TB cflags. This gives us up to 256 clusters, which should
> > be enough for anybody. If we ever need more, or need
> > more bits in cflags for other purposes, we could make
> > tb_hash_func() take more data (and expand qemu_xxhash7()
> > to qemu_xxhash8()).
> >
> > Signed-off-by: Peter Maydell <address@hidden>
> > ---
> >  include/exec/exec-all.h   | 4 +++-
> >  accel/tcg/cpu-exec.c      | 4 ++++
> >  accel/tcg/translate-all.c | 3 +++
> >  3 files changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> > index 815e5b1e838..aa7b81aaf01 100644
> > --- a/include/exec/exec-all.h
> > +++ b/include/exec/exec-all.h
> > @@ -351,9 +351,11 @@ struct TranslationBlock {
> >  #define CF_USE_ICOUNT  0x00020000
> >  #define CF_INVALID     0x00040000 /* TB is stale. Set with @jmp_lock
> held */
> >  #define CF_PARALLEL    0x00080000 /* Generate code for a parallel
> context */
> > +#define CF_CLUSTER_MASK 0xff000000 /* Top 8 bits are cluster ID */
> > +#define CF_CLUSTER_SHIFT 24
> >  /* cflags' mask for hashing/comparison */
> >  #define CF_HASH_MASK   \
> > -    (CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL)
> > +    (CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL |
> CF_CLUSTER_MASK)
> >
> >      /* Per-vCPU dynamic tracing state used to generate this TB */
> >      uint32_t trace_vcpu_dstate;
> > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> > index 870027d4359..e578a1a3aee 100644
> > --- a/accel/tcg/cpu-exec.c
> > +++ b/accel/tcg/cpu-exec.c
> > @@ -336,6 +336,10 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu,
> target_ulong pc,
> >          return NULL;
> >      }
> >      desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
> > +
> > +    cf_mask &= ~CF_CLUSTER_MASK;
> > +    cf_mask |= cpu->cluster_index << CF_CLUSTER_SHIFT;
> > +
>
> This hunk turns out not to be quite right -- it needs to move
> to the top of the function, before the assignment
> "desc.flags = flags;". Otherwise tb_lookup_cmp() will
> spuriously fail, and execution becomes somewhat slower because
> we have to keep retranslating TBs rather than reusing them.
> (Surprisingly this is only noticeable in an ARM TFM image
> I happen to have, not in Linux kernel boot...)
>
> thanks
> -- PMM
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]