qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error
Date: Mon, 29 Jul 2019 11:09:46 +0100
User-agent: Mutt/1.12.0 (2019-05-25)

On Fri, Jul 26, 2019 at 04:18:46PM -0400, John Snow wrote:
> Paolo, Stefan and Kevin: can I loop you in here? I'm quite uncertain
> about this and I'd like to clear this up quickly if it's possible:
> 
> On 7/25/19 8:58 PM, John Snow wrote:
> > 
> > 
> > On 7/7/19 10:55 PM, address@hidden wrote:
> >> From: Shaju Abraham <address@hidden>
> >>
> >> During the  IDE DMA transfer for a ISCSI target,when libiscsi encounters
> >> a SENSE KEY error, it sets the task->sense to  the value "COMMAND ABORTED".
> >> The function iscsi_translate_sense() later translaters this error to 
> >> -ECANCELED
> >> and this value is passed to the callback function. In the case of  IDE DMA 
> >> read
> >> or write, the callback function returns immediately if the value of the ret
> >> argument is -ECANCELED.
> >> Later when ide_cancel_dma_sync() function is invoked  the assertion
> >> "s->bus->dma->aiocb == ((void *)0)" fails and the qemu process gets 
> >> terminated.
> >> Fix the issue by making the value of s->bus->dma->aiocb = NULL when
> >> -ECANCELED is passed to the callback.
> >>
> >> Signed-off-by: Shaju Abraham <address@hidden>
> >> ---
> >>  hw/ide/core.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/hw/ide/core.c b/hw/ide/core.c
> >> index 6afadf8..78ea357 100644
> >> --- a/hw/ide/core.c
> >> +++ b/hw/ide/core.c
> >> @@ -841,6 +841,7 @@ static void ide_dma_cb(void *opaque, int ret)
> >>      bool stay_active = false;
> >>  
> >>      if (ret == -ECANCELED) {
> >> +        s->bus->dma->aiocb = NULL;
> >>          return;
> >>      }
> >>  
> >>
> > 
> > The part that makes me nervous here is that I can't remember why we do
> > NO cleanup whatsoever for the ECANCELED case.
> > 
> > commit 0d910cfeaf2076b116b4517166d5deb0fea76394
> > Author: Fam Zheng <address@hidden>
> > Date:   Thu Sep 11 13:41:07 2014 +0800
> > 
> >     ide/ahci: Check for -ECANCELED in aio callbacks
> > 
> > 
> > ... This looks like we never expected the aio callbacks to ever get
> > called with ECANCELED, so we treat this as a QEMU-internal signal.
> > 
> > It looks like we expect these callbacks to do NOTHING in this case; but
> > I'm not sure where the IDE state machine does its cleanup otherwise.
> > (The DMA might have been canceled, but the DMA and IDE state machines
> > still need to exit their loop.)
> > 
> > If you take a look at this patch from 2014 though, there are many other
> > spots where we have littered ECANCELED checks that might also cause
> > problems if we're receiving error codes we thought we couldn't get normally.
> > 
> > I am worried this patch papers over something worse.
> > 
> I'm not clear why Fam's patch adds a do-nothing return to the ide_dma_cb
> if it's invoked with ECANCELED: shouldn't it be the case that the IDE
> state machine needs to know that a transfer it was relying on to service
> an ATA command was canceled and treat it like an error?
> 
> Why was it ever correct to ignore these? Is it because we only ever
> canceled DMA during reset/shutdown/etc?
> 
> It appears as if iscsi requests can actually genuinely return an
> ECANCELED errno, so there are likely several places in the IDE code that
> need to accommodate this from happening.
> 
> The easiest fix LOOKS like just deleting the special-casing of ECANCELED
> altogether and letting the error pathways handle things as normal.
> 
> Am I mistaken?

I think your instincts are right that there are deeper issues.  The
first step would be test cases, then you can be sure various scenarios
have been handled correctly.

I noticed that ide_sector_read_cb(), ide_sector_write_cb(), and
ide_flush_cb() all differ in whether they reset s->pio_aiocb and
s->status before returning early due to -ECANCELED.  That must be a bug.

I didn't look at the ide_dma_cb() code path.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]