[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Re: [PATCH 04/22] savevm: do_loadvm(): Always resume the VM
From: |
Juan Quintela |
Subject: |
[Qemu-devel] Re: [PATCH 04/22] savevm: do_loadvm(): Always resume the VM |
Date: |
Wed, 21 Apr 2010 17:39:00 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) |
Luiz Capitulino <address@hidden> wrote:
> On Wed, 21 Apr 2010 10:36:29 +0200
> Juan Quintela <address@hidden> wrote:
>
>> QTAILQ_FOREACH(dinfo, &drives, next) {
>> bs1 = dinfo->bdrv;
>> if (bdrv_has_snapshot(bs1)) {
>>
>> /// We found a device that has snapshots
>> ret = bdrv_snapshot_goto(bs1, name);
>> if (ret < 0) {
>> /// And don't have a snapshot with the name that we wanted
>> switch(ret) {
>> case -ENOTSUP:
>> error_report("%sSnapshots not supported on device '%s'",
>> bs != bs1 ? "Warning: " : "",
>> bdrv_get_device_name(bs1));
>> break;
>> case -ENOENT:
>> error_report("%sCould not find snapshot '%s' on device
>> '%s'",
>> bs != bs1 ? "Warning: " : "",
>> name, bdrv_get_device_name(bs1));
>> break;
>> default:
>> error_report("%sError %d while activating snapshot on
>> '%s'",
>> bs != bs1 ? "Warning: " : "",
>> ret, bdrv_get_device_name(bs1));
>> break;
>> }
>> /* fatal on snapshot block device */
>> // I think that one inconditional exit with predjuice could be in order here
>>
>> // Notice that bdrv_snapshot_goto() modifies the disk, name is as bad as
>> // you can get. It just open the disk, opens the snapshot, increases
>> // its counter of users, and makes it available for use after here
>> // (i.e. loading state, posibly conflicting with previous running
>> // VM a.k.a. disk corruption.
>>
>> if (bs == bs1)
>> return 0;
>>
>> // This error is as bad as it can gets :( We have to load a vmstate,
>> // and the disk that should have the memory image don't have it.
>> // This is an error, I just put the wrong nunmber the previous time.
>> // Notice that this error should be very rare.
>
> So, the current code is buggy and if you fix it (by returning -1)
> you'll get another bug: loadvm will stop the VM for trivial errors
> like a not found image.
It is not a trivial error!!!! And worse, it is not recoverable :(
> How do you plan to fix this?
Returning error and stoping machine.
>> As stated, I don't think that trying to run the machine at any point
>> would make any sense. Only case where it is safe to run it is if the
>> failure is at get_bs_snapshots(), but at that point running the machine
>> means:
>
> Actually, it must not pause the VM when recovery is (clearly) possible,
> otherwise it's a usability bug for the user Monitor and a possibly serious
> bug when you don't have human intervention (eg. QMP).
It is not posible, we have change the device status from what was
before. bets are off. we don't have a way to go back to the "safe state".
>> <something happens>
>> $ loadvm other_image
>> Error "other_image" snapshot don't exist.
>> $
>>
>> running the previous VM looks like something that should be done
>> explicitely. If the error happened after that get_bs_snapshots(),
>> We would need a new flag to just refuse to continue. Only valid
>> operations at that point are other loadvm operations, i.e. our state is
>> wrong one way or another.
>
> It's not clear to me how this flag can help, but anyway, what we need
> here is:
>
> 1. Fail when failure is reported (vs. report a failure and return OK)
This is a bug, plain an simple.
> 2. Don't keep the VM paused when recovery is possible
>
> If you can fix that, it's ok to me: I'll drop this and the next patch.
>
> Otherwise I'll have to insist on the split.
Re-read my email. At this point, nothing is fixable :( After doing
the 1st:
>> ret = bdrv_snapshot_goto(bs1, name);
and not returning an error -> state has changed, period. You can't
restart the machine.
If you prefer, you can chang loadvm in a way that after a failure -> you
can't "cont" it until you get a "working" loadvm.
Later, Juan.
- Re: [Qemu-devel] [PATCH 07/22] qemu-error: Introduce get_errno_string(), (continued)
[Qemu-devel] [PATCH 06/22] savevm: load_vmstate(): Improve error check, Luiz Capitulino, 2010/04/20
[Qemu-devel] [PATCH 04/22] savevm: do_loadvm(): Always resume the VM, Luiz Capitulino, 2010/04/20
- [Qemu-devel] Re: [PATCH 04/22] savevm: do_loadvm(): Always resume the VM, Juan Quintela, 2010/04/20
- [Qemu-devel] Re: [PATCH 04/22] savevm: do_loadvm(): Always resume the VM, Luiz Capitulino, 2010/04/20
- [Qemu-devel] Re: [PATCH 04/22] savevm: do_loadvm(): Always resume the VM, Juan Quintela, 2010/04/21
- [Qemu-devel] Re: [PATCH 04/22] savevm: do_loadvm(): Always resume the VM, Luiz Capitulino, 2010/04/21
- [Qemu-devel] Re: [PATCH 04/22] savevm: do_loadvm(): Always resume the VM,
Juan Quintela <=
- [Qemu-devel] Re: [PATCH 04/22] savevm: do_loadvm(): Always resume the VM, Kevin Wolf, 2010/04/21
- [Qemu-devel] Re: [PATCH 04/22] savevm: do_loadvm(): Always resume the VM, Luiz Capitulino, 2010/04/22
[Qemu-devel] Re: [PATCH 04/22] savevm: do_loadvm(): Always resume the VM, Kevin Wolf, 2010/04/21
[Qemu-devel] Re: [PATCH 04/22] savevm: do_loadvm(): Always resume the VM, Juan Quintela, 2010/04/21
Re: [Qemu-devel] Re: [PATCH 04/22] savevm: do_loadvm(): Always resume the VM, Jamie Lokier, 2010/04/21
[Qemu-devel] [PATCH 09/22] QError: New QERR_SNAPSHOT_DELETE_FAILED, Luiz Capitulino, 2010/04/20
[Qemu-devel] [PATCH 12/22] QError: New QERR_STATEVM_SAVE_FAILED, Luiz Capitulino, 2010/04/20