[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/5] migration: Fix possible deadloop of ram save process
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [PATCH 1/5] migration: Fix possible deadloop of ram save process |
Date: |
Thu, 22 Sep 2022 17:41:30 +0100 |
User-agent: |
Mutt/2.2.7 (2022-08-07) |
* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Sep 22, 2022 at 03:49:38PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > When starting ram saving procedure (especially at the completion phase),
> > > always set last_seen_block to non-NULL to make sure we can always
> > > correctly
> > > detect the case where "we've migrated all the dirty pages".
> > >
> > > Then we'll guarantee both last_seen_block and pss.block will be valid
> > > always before the loop starts.
> > >
> > > See the comment in the code for some details.
> > >
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> >
> > Yeh I guess it can currently only happen during restart?
>
> There're only two places to clear last_seen_block:
>
> ram_state_reset[2683] rs->last_seen_block = NULL;
> ram_postcopy_send_discard_bitmap[2876] rs->last_seen_block = NULL;
>
> Where for the reset case:
>
> ram_state_init[2994] ram_state_reset(*rsp);
> ram_state_resume_prepare[3110] ram_state_reset(rs);
> ram_save_iterate[3271] ram_state_reset(rs);
>
> So I think it can at least happen in two places, either (1) postcopy just
> started (assume when postcopy starts accidentally when all dirty pages were
> migrated?), or (2) postcopy recover from failure.
Oh, (1) is a more general problem then; yeh.
> In my case I triggered this deadloop when I was debugging the other bug
> fixed by the next patch where it was postcopy recovery (on tls), but only
> once.. So currently I'm still not 100% sure whether this is the same
> problem, but logically it could trigger.
>
> I also remember I used to hit very rare deadloops before too, maybe they're
> the same thing because I did test recovery a lot.
Note; 'deadlock' not 'deadloop'.
Dave
> >
> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
> Thanks!
>
> --
> Peter Xu
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
- [PATCH 4/5] migration: Disallow postcopy preempt to be used with compress, (continued)
- [PATCH 4/5] migration: Disallow postcopy preempt to be used with compress, Peter Xu, 2022/09/20
- [PATCH 5/5] migration: Use non-atomic ops for clear log bitmap, Peter Xu, 2022/09/20
- [PATCH 3/5] migration: Disallow xbzrle with postcopy, Peter Xu, 2022/09/20
- [PATCH 2/5] migration: Fix race on qemu_file_shutdown(), Peter Xu, 2022/09/20
- [PATCH 1/5] migration: Fix possible deadloop of ram save process, Peter Xu, 2022/09/20