qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 11/13] pc: add 'etc/boot-cpus' fw_cfg file fo


From: Eduardo Habkost
Subject: Re: [Qemu-devel] [PATCH v4 11/13] pc: add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs
Date: Thu, 20 Oct 2016 10:27:33 -0200
User-agent: Mutt/1.7.0 (2016-08-17)

On Thu, Oct 20, 2016 at 01:27:34PM +0200, Igor Mammedov wrote:
> On Wed, 19 Oct 2016 16:29:29 -0200
> Eduardo Habkost <address@hidden> wrote:
> 
> > On Wed, Oct 19, 2016 at 05:18:38PM +0200, Igor Mammedov wrote:
> > > On Wed, 19 Oct 2016 11:15:46 -0200
> > > Eduardo Habkost <address@hidden> wrote:
> > >   
> > > > On Wed, Oct 19, 2016 at 02:05:41PM +0200, Igor Mammedov wrote:  
> > > > > Currently firmware uses 1 byte at 0x5F offset in RTC CMOS
> > > > > to get number of CPUs present at boot. However 1 byte is
> > > > > not enough to handle more than 255 CPUs.  So add a new
> > > > > fw_cfg file that would allow QEMU to tell it.
> > > > > For compat reasons add file only for machine types that
> > > > > support more than 255 CPUs.
> > > > > 
> > > > > Signed-off-by: Igor Mammedov <address@hidden>    
> > > > [...]  
> > > > >  static X86CPU *pc_new_cpu(const char *typename, int64_t apic_id,
> > > > >                            Error **errp)
> > > > >  {
> > > > > @@ -1232,6 +1221,11 @@ static void 
> > > > > pc_build_feature_control_file(PCMachineState *pcms)
> > > > >      fw_cfg_add_file(pcms->fw_cfg, "etc/msr_feature_control", val, 
> > > > > sizeof(*val));
> > > > >  }
> > > > >  
> > > > > +static void rtc_set_cpus_count(ISADevice *rtc, uint16_t cpus_count)
> > > > > +{
> > > > > +    rtc_set_memory(rtc, 0x5f, cpus_count - 1);    
> > > > 
> > > > If we have more than 255 CPUs, shouldn't we at least tell the old
> > > > BIOS that we have 255, instead of silently truncating bits?  
> > > It won't do any good to BIOS as it would hang in AP wakeup due to
> > >  (expected != woken up) condition.  
> > 
> > Even in this case, truncating bits makes it a bit unpredictable:
> > having 257 CPUs would set RTC memory to 0, BIOS will believe it
> > is a UP system.
> and it will do AP wakeup regardless, where old BIOS will hang
> due to unexpectedly woken-up CPUs regardless of value in cmos.

0 (1 CPU) seems to be the only value that will not hang (at least
it won't hang at the same point). If cmos_cpu_count is 1, SeaBIOS
won't even wait for the other CPUs to wake up.

> 
> Anyways I don't care so if you'd really prefer to have cmos
> set to some static value if cpus count more than 256, just
> tell me what value and what justification I shall put in
> comment so we won't wonder later why it's there.

I would set it to 254. Any value except 0 (1 CPU) would likely
hang. The most likely candidates to me are:

* 0 (1 CPU): will hint to the BIOS that it shouldn't try
  to initialize SMP
* 255 (256 CPUs): the largest possible value
* 254 (255 CPUs): the largest possible value that won't overflow
  in case the BIOS uses 8-bit for the CPU count

I was going to suggest 254 (255 CPUs), but maybe 0 (1 CPU) would
even allow the system to boot in UP mode with today's SeaBIOS?
(Why does SeaBIOS need to start the other CPUs, anyway, except
for building the APIC ID list for the ACPI tables?)

254 and 255 could also confuse the BIOS because it will look at
8-bit APIC IDs in CPUID[1].EBX.

That said, setting it to 0 (1 CPU) sounds like the best option.
But I would be OK with any other value as long as it is
predictable.

> 
> > > >   
> > > > > +}
> > > > > +
> > > > >  static
> > > > >  void pc_machine_done(Notifier *notifier, void *data)
> > > > >  {
> > > > > @@ -1240,7 +1234,7 @@ void pc_machine_done(Notifier *notifier, void 
> > > > > *data)
> > > > >      PCIBus *bus = pcms->bus;
> > > > >  
> > > > >      /* set the number of CPUs */
> > > > > -    rtc_set_memory(pcms->rtc, 0x5f, pc_present_cpus_count(pcms) - 1);
> > > > > +    rtc_set_cpus_count(pcms->rtc, le16_to_cpu(pcms->boot_cpus_le));
> > > > >  
> > > > >      if (bus) {
> > > > >          int extra_hosts = 0;
> > > > > @@ -1261,8 +1255,15 @@ void pc_machine_done(Notifier *notifier, void 
> > > > > *data)
> > > > >  
> > > > >      acpi_setup();
> > > > >      if (pcms->fw_cfg) {
> > > > > +        MachineClass *mc = MACHINE_GET_CLASS(pcms);
> > > > > +
> > > > >          pc_build_smbios(pcms->fw_cfg);
> > > > >          pc_build_feature_control_file(pcms);
> > > > > +
> > > > > +        if (mc->max_cpus > 255) {
> > > > > +            fw_cfg_add_file(pcms->fw_cfg, "etc/boot-cpus", 
> > > > > &pcms->boot_cpus_le,
> > > > > +                            sizeof(pcms->boot_cpus_le));
> > > > > +        }
> > > > >      }
> > > > >  }
> > > > >  
> > > > > @@ -1786,9 +1787,11 @@ static void pc_cpu_plug(HotplugHandler 
> > > > > *hotplug_dev,
> > > > >          }
> > > > >      }
> > > > >  
> > > > > +    /* increment the number of CPUs */
> > > > > +    pcms->boot_cpus_le = cpu_to_le16(le16_to_cpu(pcms->boot_cpus_le) 
> > > > > + 1);    
> > > > 
> > > > Is this really safe? What if the guest is in the middle of a
> > > > etc/boot-cpus read?  
> > > It's safe for boot CPUs but
> > > it's not safe to hotplug cpus CPU during BIOS boot at all
> > > as number of CPUs read from boot_cpus might not match number
> > > of CPUs that received INIT/SIPI wakeup.
> > > This problem is ignored for now, I've dropped related SeaBIOS patch
> > > by Kevin's request and would explore it some more to avoid race there.
> > > 
> > > Anyways,
> > > Do you have an idea how to improve reading from pcms->boot_cpus_le and 
> > > make it atomic?  
> > 
> > No idea. SeaBIOS seems to use insb to read fw_cfg files, so we
> > could be updating the data between two reads. I don't think we
> > want to design a fw_cfg guest<->host synchronization mechanism.
> > 
> > I believe the solution is to not change it at all after booting
> > the guest. I suggest initializing/reinitializing fw_cfg data only
> > on reset.
> I've checked it once more and value read by BIOS atomically
> as both QEMU and SeaBIOS use dma interface to transfer data.
> BIOS transfers control to QEMU once via
> 
>  outl(PORT_QEMU_CFG_DMA_ADDR_LOW)
> 
> and after that access to pcms->boot_cpus_le is protected by BQL.
> So there is no need to invent some sort of synchronization
> or limit update to fixed points (machine_done/reset).

You're right. I was assuming the worst case and looking at the
non-DMA path.

In this case, I believe we're safe except when running old BIOS
that doesn't support fw_cfg DMA.

> 
> > After that, we could block or delay CPU hotplug until we know the
> > guest will be able to handle it. Do we have anything that
> > prevents or delays hotplug until we know the guest is able to
> > handle the hotplug events?
> Not that I know of,
> and I'd like to fix BIOS side to handle AP wakeup gracefully
> even if cpus are hotplugged while BIOS is running instead of
> putting band-aids to workaround the issue.
> 

No problem to me, as the DMA path seems safe.

-- 
Eduardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]