[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: should ioapic_service really be modelling cpu writes?
From: |
Alex Bennée |
Subject: |
Re: should ioapic_service really be modelling cpu writes? |
Date: |
Fri, 11 Nov 2022 12:26:21 +0000 |
User-agent: |
mu4e 1.9.1; emacs 28.2.50 |
Peter Xu <peterx@redhat.com> writes:
> Hi, Alex,
>
> On Thu, Nov 10, 2022 at 05:55:51PM +0000, Alex Bennée wrote:
>>
>> Alex Bennée <alex.bennee@linaro.org> writes:
>>
>> > Hi,
>> >
>> > I've been trying to remove current_cpu hacks from our hw/ emulation and
>> > replace them with an explicit cpu_index derived from MemTxAttrs. So far
>> > this has been going mostly ok but I've run into problems on x86 due to
>> > the way the apic/ioapic are modelled. It comes down to the function
>> > ioapic_service() which eventually does:
>> >
>> > /* No matter whether IR is enabled, we translate
>> > * the IOAPIC message into a MSI one, and its
>> > * address space will decide whether we need a
>> > * translation. */
>> > stl_le_phys(ioapic_as, info.addr, info.data);
>> >
>> > Which essentially calls the memory API to simulate a memory write.
>> > However to generate a properly formed MemTxAttrs we need to know what
>> > CPU we are running on. In the case of ioapic_service() we may well be in
>> > the main thread either for an expiring timer:
>> >
>> > hpet_timer->qemu_set_irq->ioapic_set_irq
>> >
>> > or for reset handling:
>> >
>> > pc_machine_reset->hpet_reset->qemu_set_irq->ioapic_set_irq
>> >
>> > neither of which can get a associated CPU. I assume if the actual writes
>> > are triggered we never actually actioned stuff because we had:
>> >
>> > DeviceState *cpu_get_current_apic(void)
>> > {
>> > if (current_cpu) {
>> > X86CPU *cpu = X86_CPU(current_cpu);
>> > return cpu->apic_state;
>> > } else {
>> > return NULL;
>> > }
>> > }
>> >
>> > which basically causes the updates to be dropped on the floor.
>
> I think it shouldn't? Normally the irq will be in MSI format (IOAPIC will
> translate to an MSI in QEMU, per ioapic_entry_parse()).
>
> I had a feeling that it'll just go the shortcut here (MSI always starts
> with 0xfeeXXXXX so definitely bigger than 0xfff):
>
> if (addr > 0xfff || !index) {
> /* MSI and MMIO APIC are at the same memory location,
> * but actually not on the global bus: MSI is on PCI bus
> * APIC is connected directly to the CPU.
> * Mapping them on the global bus happens to work because
> * MSI registers are reserved in APIC MMIO and vice versa. */
> MSIMessage msi = { .address = addr, .data = val };
> apic_send_msi(&msi);
> return;
> }
>
> apic_send_msi() doesn't need a cpu context.
Ahh so yes maybe my changes where too quick to reject the MMIO. So I've
made the following tweak to ioapic_service():
/*
* No matter whether IR is enabled, we translate
* the IOAPIC message into a MSI one, and its
* address space will decide whether we need a
* translation.
*
* As it is an access from something other than the
* CPU or a PCI device we set its source as machine
* specific.
*/
{
MemTxAttrs attrs = MEMTXATTRS_MACHINE(MEMTX_IOAPIC);
MemTxResult res;
address_space_stl_le(ioapic_as, info.addr, info.data,
attrs, &res);
if (res != MEMTX_ERROR) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: couldn't write to %"PRIx32"\n",
__func__, info.addr);
}
}
and I had already tweaked the pci_msi_trigger so:
static void pci_msi_trigger(PCIDevice *dev, MSIMessage msg)
{
MemTxAttrs attrs = {
.requester_type = MTRT_PCI,
.requester_id = pci_requester_id(dev)
};
address_space_stl_le(&dev->bus_master_as, msg.address, msg.data,
attrs, NULL);
}
> No expert on that, but per my understanding ioapic isn't really bound to
> any apic, so it doesn't need any cpu context. As a quick reference of
> that, one can look at Intel SDM Vol 3 Chap 10, figure 10.3 will be a
> generic modern x86_64 system APIC structure.
>
> In hardware there should have a 3-wire apic bus that take care of all the
> messaging, including at least not only ioapic irqs to any cores, or IPIs
> between the cores. The messages coming from the ioapic should not require
> any apic too as it can come from devices, just like what we do with qemu
> when the device does things like pci_set_irq(), iiuc.
<snip>
>
> AFAICT apic_mem_write() doesn't mean that this cpu will take this IRQ. The
> target core to respond to the IRQ will be defined in the dest ID field of
> either an MSI message or embeded in the IOAPIC entry being setup by the
> guest driver. E.g. MSI message format can also be found in SDM Vol 3 chap
> 10.11.1, in QEMU we can refer to "dest" field of apic_send_msi().
So now the start of apic_mem_write checks and validates MSIs first:
static MemTxResult apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
unsigned int size, MemTxAttrs attrs)
{
DeviceState *dev;
APICCommonState *s;
int index = (addr >> 4) & 0xff;
if (size < 4) {
return MEMTX_ERROR;
}
/*
* MSI and MMIO APIC are at the same memory location, but actually
* not on the global bus: MSI is on PCI bus APIC is connected
* directly to the CPU.
*
* We can check the MemTxAttrs to check they are coming from where
* we expect. Even though the MSI registers are reserved in APIC
* MMIO and vice versa they shouldn't respond to CPU writes.
*/
if (addr > 0xfff || !index) {
switch (attrs.requester_type) {
case MTRT_MACHINE:
/* should be from the directly wired IOPIC */
if (attrs.requester_id != MEMTX_IOAPIC) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: rejecting machine write from something
other that IOPIC (%x)",
__func__, attrs.requester_id);
return MEMTX_ACCESS_ERROR;
}
break;
case MTRT_PCI:
/* PCI signalled MSI */
break;
case MTRT_UNSPECIFIED:
qemu_log_mask(LOG_GUEST_ERROR,
"%s: rejecting unspecified write", __func__);
return MEMTX_ACCESS_ERROR;
case MTRT_CPU:
/* can CPU directly trigger MSIs? */
break;
}
MSIMessage msi = { .address = addr, .data = val };
apic_send_msi(&msi);
return MEMTX_OK;
}
if (attrs.requester_type != MTRT_CPU) {
return MEMTX_ACCESS_ERROR;
}
dev = cpu_get_current_apic(attrs.requester_id);
s = APIC(dev);
trace_apic_mem_writel(addr, val);
which at least gets things booting properly. Does this seem like a
better modelling of the APIC behaviour?
--
Alex Bennée