qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] cxl can not create region


From: Jonathan Cameron
Subject: Re: [BUG] cxl can not create region
Date: Thu, 18 Aug 2022 17:37:40 +0100

On Wed, 17 Aug 2022 17:16:19 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Thu, 11 Aug 2022 17:46:55 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Dan Williams wrote:  
> > > Bobo WL wrote:    
> > > > Hi Dan,
> > > > 
> > > > Thanks for your reply!
> > > > 
> > > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> 
> > > > wrote:    
> > > > >
> > > > > What is the output of:
> > > > >
> > > > >     cxl list -MDTu -d decoder0.0
> > > > >
> > > > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
> > > > > at least not in the specified order, or that validation check is 
> > > > > broken.    
> > > > 
> > > > Command "cxl list -MDTu -d decoder0.0" output:    
> > > 
> > > Thanks for this, I think I know the problem, but will try some
> > > experiments with cxl_test first.    
> > 
> > Hmm, so my cxl_test experiment unfortunately passed so I'm not
> > reproducing the failure mode. This is the result of creating x4 region
> > with devices directly attached to a single host-bridge:
> > 
> > # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s 
> > $((1<<30))
> > {
> >   "region":"region8",
> >   "resource":"0xf1f0000000",
> >   "size":"1024.00 MiB (1073.74 MB)",
> >   "interleave_ways":4,
> >   "interleave_granularity":256,
> >   "decode_state":"commit",
> >   "mappings":[
> >     {
> >       "position":3,
> >       "memdev":"mem11",
> >       "decoder":"decoder21.0"
> >     },
> >     {
> >       "position":2,
> >       "memdev":"mem9",
> >       "decoder":"decoder19.0"
> >     },
> >     {
> >       "position":1,
> >       "memdev":"mem10",
> >       "decoder":"decoder20.0"
> >     },
> >     {
> >       "position":0,
> >       "memdev":"mem12",
> >       "decoder":"decoder22.0"
> >     }
> >   ]
> > }
> > cxl region: cmd_create_region: created 1 region
> >   
> > > Did the commit_store() crash stop reproducing with latest cxl/preview
> > > branch?    
> > 
> > I missed the answer to this question.
> > 
> > All of these changes are now in Linus' tree perhaps give that a try and
> > post the debug log again?  
> 
> Hi Dan,
> 
> I've moved onto looking at this one.
> 1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy 
> that up
> at some stage), 1 switch, 4 downstream switch ports each with a type 3
> 
> I'm not getting a crash, but can't successfully setup a region.
> Upon adding the final target
> It's failing in check_last_peer() as pos < distance.
> Seems distance is 4 which makes me think it's using the wrong level of the 
> heirarchy for
> some reason or that distance check is wrong.
> Wasn't a good idea to just skip that step though as it goes boom - though
> stack trace is not useful.

Turns out really weird corruption happens if you accidentally back two type3 
devices
with the same memory device. Who would have thought it :)

That aside ignoring the check_last_peer() failure seems to make everything work 
for this
topology.  I'm not seeing the crash, so my guess is we fixed it somewhere along 
the way.

Now for the fun one.  I've replicated the crash if we have

1HB 1*RP 1SW, 4SW-DSP, 4Type3

Now, I'd expect to see it not 'work' because the QEMU HDM decoder won't be 
programmed
but the null pointer dereference isn't related to that.

The bug is straight forward.  Not all decoders have commit callbacks... Will 
send out
a possible fix shortly.

Jonathan



> 
> Jonathan
> 
> 
> 
> 
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]