qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] RFC: Code fetch optimisation


From: Jocelyn Mayer
Subject: Re: [Qemu-devel] RFC: Code fetch optimisation
Date: Fri, 12 Oct 2007 20:24:44 +0200

On Fri, 2007-10-12 at 18:21 +0300, Blue Swirl wrote:
> On 10/12/07, J. Mayer <address@hidden> wrote:
> > Here's a small patch that allow an optimisation for code fetch, at least
> > for RISC CPU targets, as suggested by Fabrice Bellard.
> > The main idea is that a translated block is never to span over a page
> > boundary. As the tb_find_slow routine already gets the physical address
> > of the page of code to be translated, the code translator could then
> > fetch the code using raw host memory accesses instead of doing it
> > through the softmmu routines.
> > This patch could also be adapted to RISC CPU targets, with care for the
> > last instruction of a page. For now, I did implement it for alpha, arm,
> > mips, PowerPC and SH4.
> > I don't actually know if the optimsation would bring a sensible speed
> > gain or if it will be absolutelly marginal.
> >
> > Please comment.
> 
> This will not work correctly for execution of MMIO registers, but
> maybe that won't work on real hardware either. Who cares.

I wonder if this is important or not... But maybe, when retrieving the
physical address we could check if it is inside ROM/RAM or an I/O area
and in the last case do not give the phys_addr information to the
translator. In that case, it would go on using the ldxx_code. I guess if
we want to do that, a set of helpers would be appreciated to avoid
adding code like:
if (phys_pc == 0)
  opc = ldul_code(virt_pc)
else
  opc = ldul_raw(phys_pc)
everywhere... I could also add another check so this set of macro would
automatically use ldxx_code if we reach a page boundary, which would
then make easy to use this optimisation for CISC/VLE architectures too.

I'm not sure of the proper solution to allow executing code from mmio
devices. But adding specific accessors to handle the CISC/VLE case is to
be done. Something like this might be OK:

static inline target_ulong ldl_code_p(unsigned long *start_pc, unsigned
long *phys_pc, target_ulong *virt_pc)
{
    target_ulong opc;
    if ((*start_pc ^ *phys_pc) & TARGET_PAGE_MASK) {
        /* slow path that may raise an exception */
        opc = ldul_code(virt_pc);
        *start_pc = phys_pc; /* Avoid softmmu call on next load */
    } else {
        /* Optimized path */
        opc = ldul_raw(phys_pc);
    }
    *phys_pc += sizeof(target_ulong);
    *virt_pc += sizeof(target_ulong);
    return opc;
}

Of course, 8, 16 and 64 (why not ?) bits accessors would be also
provided the same way.

> Wouldn't it be even more efficient if you moved most of this calculation:
> +    phys_pc = (unsigned long)phys_ram_base + tb->page_addr[0] +
> +        (pc_start & ~TARGET_PAGE_MASK);
> here:
> +    tb->page_addr[0] = phys_page1;
> ?

Maybe. I choosed to do this way because it's exactly the same assignment
that is done in tb_link_phys after the gen_intermediate_code function
returns. I then though that the safer thing to do was to store the same
value so, whatever might happen, the value in the tb structure is never
inconsistent. I also guess that it's not so important as the tb is not
linked at this point...






reply via email to

[Prev in Thread] Current Thread [Next in Thread]