qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PULL 61/73] hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register


From: Michael S. Tsirkin
Subject: Re: [PULL 61/73] hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register
Date: Wed, 26 Apr 2023 02:12:47 -0400

On Tue, Apr 25, 2023 at 08:42:17PM -0400, Peter Xu wrote:
> Hi, Michael, Jonathan,
> 
> On Tue, Mar 07, 2023 at 08:13:53PM -0500, Michael S. Tsirkin wrote:
> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > 
> > This register in AER should be both writeable and should
> > have a default value with a couple of the errors masked
> > including the Uncorrectable Internal Error used by CXL for
> > it's error reporting.
> > 
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > Message-Id: <20230302133709.30373-2-Jonathan.Cameron@huawei.com>
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > Reviewed-by: Fan Ni <fan.ni@samsung.com>
> > ---
> >  include/hw/pci/pcie_regs.h | 3 +++
> >  hw/pci/pcie_aer.c          | 4 ++++
> >  2 files changed, 7 insertions(+)
> > 
> > diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
> > index 1fe0bdd25b..4972106c42 100644
> > --- a/include/hw/pci/pcie_regs.h
> > +++ b/include/hw/pci/pcie_regs.h
> > @@ -141,6 +141,9 @@ typedef enum PCIExpLinkWidth {
> >                                           PCI_ERR_UNC_ATOP_EBLOCKED |    \
> >                                           PCI_ERR_UNC_TLP_PRF_BLOCKED)
> >  
> > +#define PCI_ERR_UNC_MASK_DEFAULT        (PCI_ERR_UNC_INTN | \
> > +                                         PCI_ERR_UNC_TLP_PRF_BLOCKED)
> > +
> >  #define PCI_ERR_UNC_SEVERITY_DEFAULT    (PCI_ERR_UNC_DLP |              \
> >                                           PCI_ERR_UNC_SDN |              \
> >                                           PCI_ERR_UNC_FCP |              \
> > diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
> > index 9a19be44ae..909e027d99 100644
> > --- a/hw/pci/pcie_aer.c
> > +++ b/hw/pci/pcie_aer.c
> > @@ -112,6 +112,10 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, 
> > uint16_t offset,
> >  
> >      pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
> >                   PCI_ERR_UNC_SUPPORTED);
> > +    pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
> > +                 PCI_ERR_UNC_MASK_DEFAULT);
> > +    pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
> > +                 PCI_ERR_UNC_SUPPORTED);
> 
> This breaks the simplest migration from QEMU 8.0->7.2 binaries on all
> machine types I think as long as the cap is present, e.g. the default
> e1000e provided by the default q35 machine can already hit it with all
> default cmdline:
> 
>   ./qemu-system-x86_64 -M pc-q35-7.2 [-incoming XXX]
> 
> 7.2 binary will have empty wmask for PCI_ERR_UNCOR_MASK, meanwhile I think
> it can also see a non-zero value, then the migration will fail at:
> 
> vmstate_load 0000:00:02.0/e1000e, e1000e                                      
>              
> qemu-7.2: get_pci_config_device: Bad config data: i=0x10a read: 40 device: 0 
> cmask: ff wmask: 0 w1cmask:0
> qemu-7.2: Failed to load PCIDevice:config   
> qemu-7.2: Failed to load e1000e:parent_obj                                    
>             
> qemu-7.2: error while loading state for instance 0x0 of device 
> '0000:00:02.0/e1000e'      
> qemu-7.2: load of migration failed: Invalid argument
> 
> We probably at least want to have the default value to be still zero, and
> we'd need to make sure it'll not be modified by the guest, iiuc.
> 
> Below oneliner works for me and makes the migration work again:
> 
> ===8<===
> diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
> index 103667c368..563a37b79c 100644
> --- a/hw/pci/pcie_aer.c
> +++ b/hw/pci/pcie_aer.c
> @@ -113,7 +113,7 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, 
> uint16_t offset,
>      pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
>                   PCI_ERR_UNC_SUPPORTED);
>      pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
> -                 PCI_ERR_UNC_MASK_DEFAULT);
> +                 0/*PCI_ERR_UNC_MASK_DEFAULT*/);
>      pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
>                   PCI_ERR_UNC_SUPPORTED);
> ===8<===
> 
> Anyone could have a look on a solid solution from PCI side?
> 
> Copy Juan and Leonardo.
> 
> Thanks,

My bad, I forgot about this 🤦.
So we need a property and tweak it with compat machinery depending on
machine type. Jonathan, can you work on this pls?
Or I can revert for now to relieve the time pressure,
redo the patch at your leasure.


> -- 
> Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]