qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] Memory API


From: Anthony Liguori
Subject: Re: [Qemu-devel] [RFC] Memory API
Date: Wed, 18 May 2011 12:04:13 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Lightning/1.0b2 Thunderbird/3.1.10

On 05/18/2011 11:41 AM, Avi Kivity wrote:
On 05/18/2011 07:33 PM, Anthony Liguori wrote:
On 05/18/2011 10:23 AM, Avi Kivity wrote:
The tricky part is wiring this up efficiently for TCG, ie. in QEMU's
softmmu. I played with passing the issuing CPUState (or NULL for
devices) down the MMIO handler chain. Not totally beautiful as
decentralized dispatching was still required, but at least only
moderately invasive. Maybe your API allows for cleaning up the
management and dispatching part, need to rethink...

My suggestion is opposite - have a different MemoryRegion for each (e.g.
CPUState::memory). Then the TLBs will resolve to a different ram_addr_t
for the same physical address, for the local APIC range.

I don't understand the different ram_addr_t part.


The TLBs map a virtual address to a ram_addr_t.

It actually maps virtual address to host virtual addresses. Virtual addresses that map to I/O memory never get stored in the TLB.

You don't need separate I/O registration addresses in order to do per-CPU dispatch provided that you route the dispatch routines through the CPUs first.

Overlapping regions can be handled differently at each level. For
instance, if a PCI device registers an IO region to the same location
as the APIC, the APIC always wins because the PCI bus will never see
the access.


That's inefficient, since you always have to traverse the hierarchy.

Is efficiency really a problem here? Besides, I don't think that's really correct. You're adding at most 2-3 extra function pointer invocations. I don't think you can really call that inefficient.

You cannot do this properly with a single dispatch table because the
behavior depends on where in the hierarchy the I/O is being handled.

You can. When you have a TLB miss, you walk the memory hierarchy (which
is per-cpu) and end up with a ram_addr_t which is stowed in the TLB
entry.

I think we're overloading the term TLB. Are you referring to l1_phys_map as the TLB because I thought Jan was referring to the actual emulated TLB that TCG uses?

Further accesses dispatch via this ram_addr_t, without taking the
cpu into consideration (the TLB is, after all, already per-cpu).

Since each APIC will have its own ram_addr_t, we don't need per-cpu
dispatch.

You need to have per-CPU l1_phys_maps which would result in quite a lot of additional memory overhead.

Regards,

Anthony Liguori





reply via email to

[Prev in Thread] Current Thread [Next in Thread]