[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 2/2] atapi migration: Throw recoverable error to
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [PATCH 2/2] atapi migration: Throw recoverable error to avoid recovery |
Date: |
Wed, 10 Dec 2014 12:05:42 +0000 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
* John Snow (address@hidden) wrote:
>
>
> On 12/09/2014 01:15 PM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <address@hidden>
> >
> >(With the previous atapi_dma flag recovery)
> >If migration happens between the ATAPI command being written and the
> >bmdma being started, the DMA is dropped. Eventually the guest times
> >out and recovers, but that can take many seconds.
> >(This is rare, on a pingpong reading the CD continuously I hit
> >this about ~1/30-1/50 migrates)
> >
> >I don't think we've got enough state to be able to recover safely
> >at this point, so I throw a 'medium error, no seek complete'
> >that I'm assuming guests will try and recover from an apparently
> >dirty CD.
> >
> >OK, it's a hack, the real solution is probably to push a lot of
> >ATAPI state into the migration stream, but this is a fix that
> >works with no stream changes. Tested only on Linux (both RHEL5
> >(pre-libata) and RHEL7).
> >
> >Signed-off-by: Dr. David Alan Gilbert <address@hidden>
> >---
> > hw/ide/atapi.c | 17 +++++++++++++++++
> > hw/ide/internal.h | 2 ++
> > hw/ide/pci.c | 11 +++++++++++
> > 3 files changed, 30 insertions(+)
> >
> >diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> >index c63b7e5..e17799c 100644
> >--- a/hw/ide/atapi.c
> >+++ b/hw/ide/atapi.c
> >@@ -394,6 +394,23 @@ static void ide_atapi_cmd_read(IDEState *s, int lba,
> >int nb_sectors,
> > }
> > }
> >
> >+
> >+/* Called by *_restart_bh when the transfer function points
> >+ * to ide_atapi_cmd
> >+ */
> >+void ide_atapi_dma_restart(IDEState *s)
> >+{
> >+ /*
> >+ * I'm not sure we have enough stored to restart the command
> >+ * safely, so give the guest an error it should recover from.
> >+ * I'm assuming most guests will try to recover from something
> >+ * listed as a medium error on a CD; it seems to work on Linux.
> >+ * This would be more of a problem if we did any other type of
> >+ * DMA operation.
> >+ */
> >+ ide_atapi_cmd_error(s, MEDIUM_ERROR, ASC_NO_SEEK_COMPLETE);
> >+}
> >+
>
> Is this safe for non-data commands? Can we even get there in such a case?
See below.
> > static inline uint8_t ide_atapi_set_profile(uint8_t *buf, uint8_t *index,
> > uint16_t profile)
> > {
> >diff --git a/hw/ide/internal.h b/hw/ide/internal.h
> >index 8a3eca4..8b65285 100644
> >--- a/hw/ide/internal.h
> >+++ b/hw/ide/internal.h
> >@@ -289,6 +289,7 @@ typedef struct IDEDMAOps IDEDMAOps;
> > #define ATAPI_INT_REASON_TAG 0xf8
> >
> > /* same constants as bochs */
> >+#define ASC_NO_SEEK_COMPLETE 0x02
> > #define ASC_ILLEGAL_OPCODE 0x20
> > #define ASC_LOGICAL_BLOCK_OOR 0x21
> > #define ASC_INV_FIELD_IN_CMD_PACKET 0x24
> >@@ -529,6 +530,7 @@ void ide_dma_error(IDEState *s);
> >
> > void ide_atapi_cmd_ok(IDEState *s);
> > void ide_atapi_cmd_error(IDEState *s, int sense_key, int asc);
> >+void ide_atapi_dma_restart(IDEState *s);
> > void ide_atapi_io_error(IDEState *s, int ret);
> >
> > void ide_ioport_write(void *opaque, uint32_t addr, uint32_t val);
> >diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> >index bee5ad3..e3f2054 100644
> >--- a/hw/ide/pci.c
> >+++ b/hw/ide/pci.c
> >@@ -235,6 +235,17 @@ static void bmdma_restart_bh(void *opaque)
> > }
> > } else if (error_status & IDE_RETRY_FLUSH) {
> > ide_flush_cache(bmdma_active_if(bm));
> >+ } else {
> >+ IDEState *s = bmdma_active_if(bm);
> >+
> >+ /*
> >+ * We've not got any bits to tell us about ATAPI - but
> >+ * we do have the end_transfer_func that tells us what
> >+ * we're trying to do.
> >+ */
> >+ if (s->end_transfer_func == ide_atapi_cmd) {
> >+ ide_atapi_dma_restart(s);
> >+ }
>
> OK, so when the restart routines get invoked we add a hook to see if we were
> in the middle of an ATAPI command and acknowledge that we don't know how to
> properly handle this.
As to your qeustion above about non-data commands; hmm probably - but how
do I guard it? I guess I could check for the atapi_dma flag the previous
patch fixed.
(This is all probably still broken for non-DMA atapi transfers)
> Isn't this going to run on every vmstate change, though?
There aren't many - only starting/stopping the CPU does it; and bmdma_restart_cb
guards it by 'if (!running)' exit, so it'll only do it when the CPU starts
running again.
> I think we don't
> clear out end_transfer_func on success, so this might fire off more than we
> want it to, although I guess end_transfer_func is usually going to get set
> to ide_atapi_cmd_reply_end if it finishes normally ...
Right, or if ide_transfer_stop is called.
> > }
> > }
> >
> >
>
> Indeed a hack, but it's probably appropriate: if our code cannot in fact
> handle ATAPI migration, throwing an error or disabling migration is the
> correct thing to do, but I don't think users would be very happy with the
> second option. I feel that this is an OK workaround because it should not
> introduce spurious errors or retries for cases where we manage to avoid
> migrating in the middle of the loop. This will at least let the currently
> broken case limp along until we fix it more properly.
>
> What makes me the most curious is how this plays out in Windows if this case
> is triggered. Throw a trace around the fake error and see if you can't
> observe it getting called during a pingpong test while Windows reads a CD.
Yeh, I'm going to figure out how to try that.
Dave
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK