[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v4 01/32] migration: better error handling with
From: |
Peter Xu |
Subject: |
Re: [Qemu-devel] [PATCH v4 01/32] migration: better error handling with QEMUFile |
Date: |
Fri, 1 Dec 2017 16:39:25 +0800 |
User-agent: |
Mutt/1.9.1 (2017-09-22) |
On Thu, Nov 30, 2017 at 10:24:38AM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (address@hidden) wrote:
> > If the postcopy down due to some reason, we can always see this on dst:
> >
> > qemu-system-x86_64: RP: Received invalid message 0x0000 length 0x0000
> >
> > However in most cases that's not the real issue. The problem is that
> > qemu_get_be16() has no way to show whether the returned data is valid or
> > not, and we are _always_ assuming it is valid. That's possibly not wise.
> >
> > The best approach to solve this would be: refactoring QEMUFile interface
> > to allow the APIs to return error if there is. However it needs quite a
> > bit of work and testing. For now, let's explicitly check the validity
> > first before using the data in all places for qemu_get_*().
> >
> > This patch tries to fix most of the cases I can see. Only if we are with
> > this, can we make sure we are processing the valid data, and also can we
> > make sure we can capture the channel down events correctly.
> >
> > Signed-off-by: Peter Xu <address@hidden>
> > ---
> > migration/migration.c | 5 +++++
> > migration/ram.c | 26 ++++++++++++++++++++++----
> > migration/savevm.c | 40 ++++++++++++++++++++++++++++++++++++++--
> > 3 files changed, 65 insertions(+), 6 deletions(-)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index c0206023d7..eae34d0524 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1708,6 +1708,11 @@ static void *source_return_path_thread(void *opaque)
> > header_type = qemu_get_be16(rp);
> > header_len = qemu_get_be16(rp);
> >
> > + if (qemu_file_get_error(rp)) {
> > + mark_source_rp_bad(ms);
> > + goto out;
> > + }
> > +
> > if (header_type >= MIG_RP_MSG_MAX ||
> > header_type == MIG_RP_MSG_INVALID) {
> > error_report("RP: Received invalid message 0x%04x length
> > 0x%04x",
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 8620aa400a..960c726ff2 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -2687,7 +2687,7 @@ static int ram_load_postcopy(QEMUFile *f)
> > void *last_host = NULL;
> > bool all_zero = false;
> >
> > - while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
> > + while (!(flags & RAM_SAVE_FLAG_EOS)) {
>
> I still think you need to keep the !ret && - see below;
> anyway, there's no harm in keeping it!
Fair enough; I'll keep it no matter what. :-)
>
> > ram_addr_t addr;
> > void *host = NULL;
> > void *page_buffer = NULL;
> > @@ -2696,6 +2696,16 @@ static int ram_load_postcopy(QEMUFile *f)
> > uint8_t ch;
> >
> > addr = qemu_get_be64(f);
> > +
> > + /*
> > + * If qemu file error, we should stop here, and then "addr"
> > + * may be invalid
> > + */
> > + ret = qemu_file_get_error(f);
> > + if (ret) {
> > + break;
> > + }
> > +
> > flags = addr & ~TARGET_PAGE_MASK;
> > addr &= TARGET_PAGE_MASK;
> >
> > @@ -2776,6 +2786,13 @@ static int ram_load_postcopy(QEMUFile *f)
> > error_report("Unknown combination of migration flags: %#x"
> > " (postcopy mode)", flags);
> > ret = -EINVAL;
[1]
> > + break;
>
> This 'break' breaks from the switch, but doesn't break the loop and
> because you remove dthe !ret && from the top, the loop keeps going when
> it shouldn't.
Ah yes I missed this one, thanks.
What I should have written here is a "goto out", and also I should add
that "out" label at the end. I think after this single change current
patch should be fine.
However I understand that you would prefer me to check the ret every
time. IMHO it's a matter of taste. I would prefer current way to do
things since I see it awkward to keep checking against (!ret) possibly
multiple times even we already know it's non-zero (especially when the
failure happens at the beginning of the loop block). But for this
patch, I can follow yours (since you asked for twice already :).
>
> > + }
> > +
> > + /* Detect for any possible file errors */
> > + if (qemu_file_get_error(f)) {
> > + ret = qemu_file_get_error(f);
> > + break;
> > }
>
> This is all simpler if you just leave the !ret && at the top, and then
> make this:
> if (!ret) {
> ret = qemu_file_get_error(f);
> }
Sure.
(So to show what I meant: if we failed at [1] above we still need to
check this, which is unecessary imho)
>
> >
> > if (place_needed) {
>
> Make that
>
> if (!ret && place_needed) {
Will do.
(same here if we failed at [1], actually we don't need to check the
ret value so many times)
>
> > @@ -2789,9 +2806,10 @@ static int ram_load_postcopy(QEMUFile *f)
> > ret = postcopy_place_page(mis, place_dest,
> > place_source, block);
> > }
> > - }
> > - if (!ret) {
> > - ret = qemu_file_get_error(f);
> > +
> > + if (ret) {
> > + break;
> > + }
>
> And with the !ret check at the top this goes again.
Yes, will remove it. Thanks!
--
Peter Xu
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [Qemu-devel] [PATCH v4 01/32] migration: better error handling with QEMUFile,
Peter Xu <=