Re: [Qemu-trivial] [Qemu-devel] [PATCH] Remove PCI class code from virti

From: David Gibson
Subject: Re: [Qemu-trivial] [Qemu-devel] [PATCH] Remove PCI class code from virtio balloon device
Date: Tue, 20 Mar 2012 11:42:06 +1100
On Mon, Mar 19, 2012 at 11:33:10AM +0000, Stefan Hajnoczi wrote:
> On Mon, Mar 19, 2012 at 03:59:23PM +1100, David Gibson wrote:
> > Currently the virtio balloon device, when using the virtio-pci interface
> > advertises itself with PCI class code MEMORY_RAM.  This is wrong; the
> > balloon is vaguely related to memory, but is nothing like a PCI memory
> > device in the meaning of the class code, and this code is not required or
> > suggested by the virtio PCI specification.
> > 
> > Worse, this patch causes problems on the pseries machine, because the
> > firmware, seeing this class code, advertises the device as memory in the
> > device tree, and then a guest kernel bug causes it to see this "memory"
> > before the real system memory, leading to a crash in early boot.
> > 
> > This patch fixes the problem by removing the bogus PCI class code on the
> > balloon device.
> > 
> > Cc: Michael S. Tsirkin <address@hidden>
> > Cc: Rusty Russell <address@hidden>
> > 
> > Signed-off-by: David Gibson <address@hidden>
> > ---
> >  hw/virtio-pci.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> Since this is a guest-visible change we might need to be careful about
> how it's introduced.
> Do we need to keep the old class code for existing machine types?  The
> new class code could be introduced only for 1.1 and later machine types
> if we want to be extra careful about introducing guest-visible
> changes.

So as a general rule, I like to be very careful about user-visible
changes.  But in this case, I don't think we want to be too hesitant.
In particular, it's not just a question of the machine type, but also
of how the guest OS will deal with the PCI class code.

The class code we were using was Just Plain Wrong.  It was not
suggetsed by the virtio spec, and it makes no sense.  It happens that
so far this caused problems only for a guest on a particular machine
type, but there's no reason it couldn't cause (different) problems for
guests on any machine type.

More to the point, it seems reasonably unlikely for existing guests to
rely on the broken behaviour: again, there's no reason they'd think
they need to based on the spec, and the usual way of matching drivers
to PCI devices is with the vendor/device IDs which are correct and not
changed by this patch.

So, unless we have a known example of an existing guest that would be
broken by this change, I think we should implement it ASAP for all
machine types.

