qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: should ioapic_service really be modelling cpu writes?


From: Alex Bennée
Subject: Re: should ioapic_service really be modelling cpu writes?
Date: Fri, 11 Nov 2022 12:26:21 +0000
User-agent: mu4e 1.9.1; emacs 28.2.50

Peter Xu <peterx@redhat.com> writes:

> Hi, Alex,
>
> On Thu, Nov 10, 2022 at 05:55:51PM +0000, Alex Bennée wrote:
>> 
>> Alex Bennée <alex.bennee@linaro.org> writes:
>> 
>> > Hi,
>> >
>> > I've been trying to remove current_cpu hacks from our hw/ emulation and
>> > replace them with an explicit cpu_index derived from MemTxAttrs. So far
>> > this has been going mostly ok but I've run into problems on x86 due to
>> > the way the apic/ioapic are modelled. It comes down to the function
>> > ioapic_service() which eventually does:
>> >
>> >    /* No matter whether IR is enabled, we translate
>> >     * the IOAPIC message into a MSI one, and its
>> >     * address space will decide whether we need a
>> >     * translation. */
>> >    stl_le_phys(ioapic_as, info.addr, info.data);
>> >
>> > Which essentially calls the memory API to simulate a memory write.
>> > However to generate a properly formed MemTxAttrs we need to know what
>> > CPU we are running on. In the case of ioapic_service() we may well be in
>> > the main thread either for an expiring timer:
>> >
>> >  hpet_timer->qemu_set_irq->ioapic_set_irq
>> >
>> > or for reset handling:
>> >
>> >  pc_machine_reset->hpet_reset->qemu_set_irq->ioapic_set_irq
>> >
>> > neither of which can get a associated CPU. I assume if the actual writes
>> > are triggered we never actually actioned stuff because we had:
>> >
>> >   DeviceState *cpu_get_current_apic(void)
>> >   {
>> >       if (current_cpu) {
>> >           X86CPU *cpu = X86_CPU(current_cpu);
>> >           return cpu->apic_state;
>> >       } else {
>> >           return NULL;
>> >       }
>> >   }
>> >
>> > which basically causes the updates to be dropped on the floor.
>
> I think it shouldn't?  Normally the irq will be in MSI format (IOAPIC will
> translate to an MSI in QEMU, per ioapic_entry_parse()).
>
> I had a feeling that it'll just go the shortcut here (MSI always starts
> with 0xfeeXXXXX so definitely bigger than 0xfff):
>
>     if (addr > 0xfff || !index) {
>         /* MSI and MMIO APIC are at the same memory location,
>          * but actually not on the global bus: MSI is on PCI bus
>          * APIC is connected directly to the CPU.
>          * Mapping them on the global bus happens to work because
>          * MSI registers are reserved in APIC MMIO and vice versa. */
>         MSIMessage msi = { .address = addr, .data = val };
>         apic_send_msi(&msi);
>         return;
>     }
>
> apic_send_msi() doesn't need a cpu context.

Ahh so yes maybe my changes where too quick to reject the MMIO. So I've
made the following tweak to ioapic_service():

                /*
                 * No matter whether IR is enabled, we translate
                 * the IOAPIC message into a MSI one, and its
                 * address space will decide whether we need a
                 * translation.
                 *
                 * As it is an access from something other than the
                 * CPU or a PCI device we set its source as machine
                 * specific.
                 */
                {
                    MemTxAttrs attrs = MEMTXATTRS_MACHINE(MEMTX_IOAPIC);
                    MemTxResult res;

                    address_space_stl_le(ioapic_as, info.addr, info.data,
                                         attrs, &res);
                    if (res != MEMTX_ERROR) {
                        qemu_log_mask(LOG_GUEST_ERROR,
                                      "%s: couldn't write to %"PRIx32"\n", 
__func__, info.addr);
                    }
                }

and I had already tweaked the pci_msi_trigger so:

  static void pci_msi_trigger(PCIDevice *dev, MSIMessage msg)
  {
      MemTxAttrs attrs = {
          .requester_type = MTRT_PCI,
          .requester_id = pci_requester_id(dev)
      };
      address_space_stl_le(&dev->bus_master_as, msg.address, msg.data,
                           attrs, NULL);
  }

> No expert on that, but per my understanding ioapic isn't really bound to
> any apic, so it doesn't need any cpu context.  As a quick reference of
> that, one can look at Intel SDM Vol 3 Chap 10, figure 10.3 will be a
> generic modern x86_64 system APIC structure.
>
> In hardware there should have a 3-wire apic bus that take care of all the
> messaging, including at least not only ioapic irqs to any cores, or IPIs
> between the cores.  The messages coming from the ioapic should not require
> any apic too as it can come from devices, just like what we do with qemu
> when the device does things like pci_set_irq(), iiuc.
<snip>
>
> AFAICT apic_mem_write() doesn't mean that this cpu will take this IRQ.  The
> target core to respond to the IRQ will be defined in the dest ID field of
> either an MSI message or embeded in the IOAPIC entry being setup by the
> guest driver.  E.g. MSI message format can also be found in SDM Vol 3 chap
> 10.11.1, in QEMU we can refer to "dest" field of apic_send_msi().


So now the start of apic_mem_write checks and validates MSIs first:

  static MemTxResult apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
                                    unsigned int size, MemTxAttrs attrs)
  {
      DeviceState *dev;
      APICCommonState *s;
      int index = (addr >> 4) & 0xff;

      if (size < 4) {
          return MEMTX_ERROR;
      }

      /*
       * MSI and MMIO APIC are at the same memory location, but actually
       * not on the global bus: MSI is on PCI bus APIC is connected
       * directly to the CPU.
       *
       * We can check the MemTxAttrs to check they are coming from where
       * we expect. Even though the MSI registers are reserved in APIC
       * MMIO and vice versa they shouldn't respond to CPU writes.
       */
      if (addr > 0xfff || !index) {
          switch (attrs.requester_type) {
          case MTRT_MACHINE:
              /* should be from the directly wired IOPIC */
              if (attrs.requester_id != MEMTX_IOAPIC) {
                  qemu_log_mask(LOG_GUEST_ERROR,
                                "%s: rejecting machine write from something 
other that IOPIC (%x)",
                                __func__, attrs.requester_id);
                  return MEMTX_ACCESS_ERROR;
              }
              break;
          case MTRT_PCI:
              /* PCI signalled MSI */
              break;
          case MTRT_UNSPECIFIED:
              qemu_log_mask(LOG_GUEST_ERROR,
                            "%s: rejecting unspecified write", __func__);
              return MEMTX_ACCESS_ERROR;
          case MTRT_CPU:
              /* can CPU directly trigger MSIs? */
              break;
          }
          MSIMessage msi = { .address = addr, .data = val };
          apic_send_msi(&msi);
          return MEMTX_OK;
      }

      if (attrs.requester_type != MTRT_CPU) {
          return MEMTX_ACCESS_ERROR;
      }
      dev = cpu_get_current_apic(attrs.requester_id);
      s = APIC(dev);

      trace_apic_mem_writel(addr, val);

which at least gets things booting properly. Does this seem like a
better modelling of the APIC behaviour?

-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]