qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] RFC: Code fetch optimisation


From: J. Mayer
Subject: Re: [Qemu-devel] RFC: Code fetch optimisation
Date: Mon, 15 Oct 2007 23:30:33 +0200

On Mon, 2007-10-15 at 17:01 +0100, Paul Brook wrote:
> > > > +    unsigned long phys_pc;
> > > > +    unsigned long phys_pc_start;
> > >
> > > These are ram offsets, not physical addresses. I recommend naming them as
> > > such to avoid confusion.
> >
> > Well, those are host addresses. Fabrice even suggested me to replace
> > them with void * to prevent confusion, but I kept using unsigned long
> > because the _p functions API do not use pointers. As those values are
> > defined as phys_ram_base + offset, those are likely to be host address,
> > not RAM offset, and are used directly to dereference host pointers in
> > the ldxxx_p functions. Did I miss something ?
> 
> You are correct, they are host addresses. I still think calling them phys_pc 
> is confusing. It took me a while to convince myself that "unsigned long" was 
> an appropriate type (ignoring 64-bit windows hosts for now).
> 
> How about host_pc?

It's OK for me.

> > > > +    /* Avoid softmmu access on next load */
> > > > +    /* XXX: dont: phys PC is not correct anymore
> > > > +     *      We could call get_phys_addr_code(env, pc); and remove the
> > > > else +     *      condition, here.
> > > > +     */
> > > > +    //*start_pc = phys_pc;
> > >
> > > The commented out code is completely bogus, please remove it. The comment
> > > is also somewhat misleading/incorrect. The else would still be required
> > > for accesses that span a page boundary.
> >
> > I guess trying to optimize this case retrieving the physical address
> > would not bring any optimization as in fact only the last translated
> > instruction of a TB (then only a few code loads) may hit this case.
> 
> VLE targets (x86, m68k) can translate almost a full page of instructions, and 
> a page boundary can be anywhere within that block. Once we've spanned 
> multiple pages there's not point stopping translation immediately. We may as 
> well translate as many instructions as we can on the second page.
> 
> I'd guess most TB are much smaller than a page, so on average only a few 
> instructions are going to come after the page boundary.

This leads me to another reflexion. For fixed length encoding targets,
we always stop translation when reaching a page boundary. If we keep
using the current model and we optimize the slow case, it would be
possible to stop only if we cross 2 pages boundary during code
translation, and it seems that this case is not likely to happen. If we
keep the current behavior, we could remove the second page_addr element
in the tb structure and maybe optimize parts of the tb management and
invalidation code.

> > I'd like to keep a comment here to show that it may not be a good idea
> > (or may not be as simple as it seems at first sight) to try to do more
> > optimisation here, but you're right this comment is not correct.
> 
> Agreed.
> 
> > > The code itself looks ok, though I'd be surprised if it made a
> > > significant difference. We're always going to hit the fast-path TLB
> > > lookup case anyway.
> >
> > It seems that the generated code for the code fetch is much more
> > efficient than the one generated when we get when using the softmmu
> > routines. But it's true we do not get any significant performance boost.
> > As it was previously mentioned, the idea of the patch is more a 'don't
> > do unneeded things during code translation' than a great performance
> > improvment.
> 
> OTOH it does make the the code more complicated. I'm agnostic about whether 
> this patch should be applied.

I agree that this proposal was an answer to a challenging idea that I
received more than a real need.
The worst thing in this patch, imho, is that you need to increase 2
values each time you want to change the PC. This is likely to bring some
bug when one will forgot to increase one of the two. I was thinking of
hiding the pc, host_pc and host_pc_start (and maybe also pc_start) in a
structure and add inline helpers:
* get_pc would return the current virtual PC, as needed by the jump and
relative memory accesses functions.
* get_tb_len would return the difference between the virtual PC and the
virtual pc_start, as it is done at the end of the gen_intermediate_code
functions
* move_pc would add an offset to the virtual and the physical PC. This
has to be target dependant, due to the special case for Sparc
* update_phys_pc would be void for most targets, except for Sparc where
the phys_pc needs to be adjusted after the translation of each target
instruction.
and maybe more, if needed.
This structure could also contain target specific information. To
address the problem of segment limit check reported by Fabrice Bellard,
we could for example add the address of the next segment limit for x86
target and add a target specific check at the start of the ldx_code_p
function. But I don't know much about segmentation "subtilities" on x86,
then this idea may not be appropriate to solve this problem.

-- 
J. Mayer <address@hidden>
Never organized





reply via email to

[Prev in Thread] Current Thread [Next in Thread]