Re: [Qemu-devel] [PATCH 13/14] raw-posix: Implement image locking

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 13/14] raw-posix: Implement image locking

From:	Max Reitz
Subject:	Re: [Qemu-devel] [PATCH 13/14] raw-posix: Implement image locking
Date:	Fri, 2 Dec 2016 17:13:33 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.0
On 31.10.2016 16:38, Fam Zheng wrote:
> This implements open flag sensible image locking for local file
> and host device protocol.
> 
> virtlockd in libvirt locks the first byte, so we start looking at the
> file bytes from 1.
> 
> Quoting what was proposed by Kevin Wolf <address@hidden>, there are
> four locking modes by combining two bits (BDRV_O_RDWR and
> BDRV_O_SHARE_RW), and implemented by taking two locks:
> 
> Lock bytes:
> 
> * byte 1: I can't allow other processes to write to the image
> * byte 2: I am writing to the image
> 
> Lock modes:
> 
> * shared writer (BDRV_O_RDWR | BDRV_O_SHARE_RW): Take shared lock on
>   byte 2. Test whether byte 1 is locked using an exclusive lock, and
>   fail if so.
> 
> * exclusive writer (BDRV_O_RDWR only): Take shared lock on byte 2. Test
>   whether byte 1 is locked using an exclusive lock, and fail if so. Then
>   take shared lock on byte 1. I suppose this is racy, but we can
>   probably tolerate that.
> 
> * reader that can tolerate writers (BDRV_O_SHARE_RW only): Don't do anything
> 
> * reader that can't tolerate writers (neither bit is set): Take shared
>   lock on byte 1. Test whether byte 2 is locked, and fail if so.
> 
> The complication is in the transactional reopen.  To make the reopen
> logic managable, and allow better reuse, the code is internally
> organized with a table from old mode to the new one.
> 
> Signed-off-by: Fam Zheng <address@hidden>
> ---
>  block/raw-posix.c | 710 
> ++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 660 insertions(+), 50 deletions(-)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 7c62fc3..07ab117 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c

Resuming review!

> @@ -538,6 +763,398 @@ static int raw_open(BlockDriverState *bs, QDict 
> *options, int flags,
>      return raw_open_common(bs, options, flags, 0, errp);
>  }
>  
> +typedef enum {
> +    RAW_LT_PREPARE,
> +    RAW_LT_COMMIT,
> +    RAW_LT_ABORT
> +} RawLockTransOp;
> +
> +typedef int (*RawReopenFunc)(RawLockTransOp op,
> +                             int old_lock_fd, int new_lock_fd,
> +                             BDRVRawLockMode old_lock,
> +                             BDRVRawLockMode new_lock,
> +                             Error **errp);
> +
> +static int raw_lt_nop(RawLockTransOp op,
> +                            int old_lock_fd, int new_lock_fd,
> +                            BDRVRawLockMode old_lock,
> +                            BDRVRawLockMode new_lock,
> +                            Error **errp)

Indentation is off.

> +{
> +    assert(old_lock == new_lock || new_lock == RAW_L_READ_SHARE_RW);
> +    return 0;
> +}
> +

Note and question before everything (after I had read basically
everything...): A small test shows me that locks are apparently shared
between dup'ed file descriptors, i.e.:

fd = open("foo");

lock(fd);
dup(fd);
unlock(fd);
close(fd);

new_fd = open("foo");
lock(new_fd); // fails

It works the other way around, too:

fd = open("foo");

lock(fd);
duped_fd = dup(fd);
unlock(duped_fd);
close(duped_fd);

new_fd = open("foo");
lock(new_fd); // fails


(The first lock() call has to use a shared lock and the second an
exclusive lock so that it works within a single process.)

This is pretty horrible because we don't know how old_lock_fd and
new_lock_fd are related. old_lock_fd was created somewhere through
raw_dup_flags() from s->fd; we have no idea whether that was actually
dup'ed or reopened. Then, rs->fd is created from s->fd, again through
raw_dup_flags(). Finally, new_lock_fd is created from rs->fd, once more
through said raw_dup_flags().

In the end, we have no idea whether new_lock_fd() is the end of a long
dup() chain from old_lock_fd or whether it was reopened somewhere in
between (well, we know that it has to be reopened when changing from R/W
to RO or the other way around, but we have no idea when keeping R/W or
RO (because dup() or fcntl() may have simply failed)).

In my opinion, this makes reviewing the code pretty difficult because
it's hard to keep in mind what cases may happen everywhere.

Would it be better to always fully re-open() the lock_fd instead of
duplicating it? Then we would at least know that old_lock_fd and
new_lock_fd never share the same lock configuration.

If you don't want to do that, there should be at least comments in the
RawReopenFuncs explaining whether new_lock_fd may share locks with
old_lock_fd or whether that's impossible.

> +static int raw_lt_from_unlock(RawLockTransOp op,
> +                              int old_lock_fd, int new_lock_fd,
> +                              BDRVRawLockMode old_lock,
> +                              BDRVRawLockMode new_lock,
> +                              Error **errp)
> +{
> +    assert(old_lock != new_lock);
> +    assert(old_lock == RAW_L_READ_SHARE_RW);
> +    switch (op) {
> +    case RAW_LT_PREPARE:
> +        return raw_lock_fd(new_lock_fd, new_lock, errp);
> +        break;

That break is a tad superfluous.

> +    case RAW_LT_COMMIT:

Maybe a "fall through" comment or something here? I think Coverity might
complain without.

> +    case RAW_LT_ABORT:
> +        break;
> +    }
> +
> +    return 0;
> +}
> +
> +static int raw_lt_read_to_write_share_rw(RawLockTransOp op,
> +                                         int old_lock_fd, int new_lock_fd,
> +                                         BDRVRawLockMode old_lock,
> +                                         BDRVRawLockMode new_lock,
> +                                         Error **errp)
> +{
> +    int ret = 0;
> +
> +    assert(old_lock == RAW_L_READ);
> +    assert(new_lock == RAW_L_WRITE_SHARE_RW);
> +    switch (op) {
> +    case RAW_LT_PREPARE:
> +        ret = qemu_lock_fd(new_lock_fd, RAW_LOCK_BYTE_WRITE, 1, false);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to lock new fd (write 
> byte)");
> +            break;
> +        }
> +        ret = qemu_lock_fd(new_lock_fd, RAW_LOCK_BYTE_NO_OTHER_WRITER, 1, 
> false);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to lock new fd (share 
> byte)");
> +            break;
> +        }
> +        ret = qemu_unlock_fd(old_lock_fd, RAW_LOCK_BYTE_NO_OTHER_WRITER, 1);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to unlock old fd (share 
> byte)");
> +            break;
> +        }
> +        ret = qemu_lock_fd(new_lock_fd, RAW_LOCK_BYTE_NO_OTHER_WRITER, 1, 
> true);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to upgrade new fd (share 
> byte)");
> +            break;
> +        }
> +        ret = qemu_lock_fd(new_lock_fd, RAW_LOCK_BYTE_NO_OTHER_WRITER, 1, 
> false);
> +        if (ret) {
> +            /* This is very unlikely, but catch it anyway. */
> +            error_setg_errno(errp, errno, "Failed to downgrade new fd (share 
> byte)");

I'd put a "break;" even into the last error block.

However, maybe a "break;" is not correct: When a reopening operation
fails, its abort function does not get called (as far as I can see). So
if we fail somewhere here, the old lock won't be restored. I think you
should either manually invoke the abort branch or in
raw_reopen_handle_lock() just always invoke the function again with
RAW_LT_ABORT if RAW_LT_PREPARE failed.

> +        }
> +        break;
> +    case RAW_LT_COMMIT:
> +        break;
> +    case RAW_LT_ABORT:
> +        ret = qemu_lock_fd(old_lock_fd, RAW_LOCK_BYTE_NO_OTHER_WRITER, 1, 
> false);
> +        if (ret) {
> +            error_report("Failed to restore lock on old fd (share byte)");
> +        }
> +        break;
> +    }
> +    return ret ? -errno : 0;

What's wrong with "return ret;"?

> +}
> +
> +static int raw_lt_read_to_write(RawLockTransOp op,
> +                                int old_lock_fd, int new_lock_fd,
> +                                BDRVRawLockMode old_lock,
> +                                BDRVRawLockMode new_lock,
> +                                Error **errp)
> +{
> +    int ret = 0;
> +
> +    assert(old_lock == RAW_L_READ);
> +    assert(new_lock == RAW_L_WRITE);
> +    switch (op) {
> +    case RAW_LT_PREPARE:
> +        ret = qemu_lock_fd(new_lock_fd, RAW_LOCK_BYTE_WRITE, 1, false);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to lock new fd (write 
> byte)");
> +            break;
> +        }
> +        ret = qemu_lock_fd(new_lock_fd, RAW_LOCK_BYTE_NO_OTHER_WRITER, 1, 
> false);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to lock new fd (share 
> byte)");
> +            break;
> +        }
> +        ret = qemu_unlock_fd(old_lock_fd, RAW_LOCK_BYTE_NO_OTHER_WRITER, 1);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to unlock old fd (share 
> byte)");
> +            break;
> +        }
> +        ret = qemu_lock_fd(new_lock_fd, RAW_LOCK_BYTE_NO_OTHER_WRITER, 1, 
> true);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to upgrade new fd (share 
> byte)");
> +            break;
> +        }

I'd suggest upgrading RAW_LOCK_BYTE_WRITE, too, to avoid race conditions.

(By the way, I don't think we can use F_OFD_GETLK here, we have to
upgrade (and then optionally downgrade) the locks because both locks
have to be held at the same time. Then again, this isn't so bad because
new_lock_fd has write access anyway.)

> +        ret = qemu_lock_fd(new_lock_fd, RAW_LOCK_BYTE_NO_OTHER_WRITER, 1, 
> false);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to restore old fd (share 
> byte) b");

block/raw-posix.c:892:81: warning: trailing " b" at the end of string
[-Wpedantic]

> +            break;
> +        }
> +        ret = qemu_lock_fd(new_lock_fd, RAW_LOCK_BYTE_NO_OTHER_WRITER, 1, 
> false);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to lock new fd (share 
> byte)");
> +            break;
> +        }

This is exactly the same as the qemu_lock_fd() invocation above it
(without the trailing " b" though).

> +        break;
> +    case RAW_LT_COMMIT:
> +        break;
> +    case RAW_LT_ABORT:
> +        ret = qemu_lock_fd(old_lock_fd, RAW_LOCK_BYTE_NO_OTHER_WRITER, 1, 
> false);
> +        if (ret) {
> +            error_report("Failed to restore lock on old fd (share byte)");
> +        }
> +        break;
> +    }
> +    return ret ? -errno : 0;
> +}
> +
> +static int raw_lt_write_share_rw_to_read(RawLockTransOp op,
> +                                         int old_lock_fd, int new_lock_fd,
> +                                         BDRVRawLockMode old_lock,
> +                                         BDRVRawLockMode new_lock,
> +                                         Error **errp)
> +{
> +    int ret = 0;
> +
> +    assert(old_lock == RAW_L_WRITE_SHARE_RW);
> +    assert(new_lock == RAW_L_READ);
> +    switch (op) {
> +    case RAW_LT_PREPARE:
> +        /* Make sure there are no other writers. */
> +        ret = qemu_lock_fd(old_lock_fd, RAW_LOCK_BYTE_WRITE, 1, true);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to lock old fd (write 
> byte)");
> +            break;
> +        }
> +        ret = qemu_lock_fd(new_lock_fd, RAW_LOCK_BYTE_NO_OTHER_WRITER, 1, 
> false);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to lock new fd (share 
> byte)");
> +            break;
> +        }
> +        break;
> +    case RAW_LT_COMMIT:
> +        ret = qemu_unlock_fd(old_lock_fd, RAW_LOCK_BYTE_WRITE, 1);
> +        if (ret) {
> +            error_report("Failed to unlock old fd (write byte)");
> +            break;
> +        }
> +        break;
> +    case RAW_LT_ABORT:
> +        break;

How about restoring the shared lock on RAW_LOCK_BYTE_WRITE in old_lock_fd?

> +    }
> +    return ret ? -errno : 0;
> +}
> +
> +static int raw_lt_write_share_rw_to_write(RawLockTransOp op,
> +                                          int old_lock_fd, int new_lock_fd,
> +                                          BDRVRawLockMode old_lock,
> +                                          BDRVRawLockMode new_lock,
> +                                          Error **errp)
> +{
> +    int ret = 0;
> +
> +    assert(old_lock == RAW_L_WRITE_SHARE_RW);
> +    assert(new_lock == RAW_L_WRITE);
> +    switch (op) {
> +    case RAW_LT_PREPARE:
> +        /* Make sure there are no other writers. */
> +        ret = qemu_lock_fd(old_lock_fd, RAW_LOCK_BYTE_WRITE, 1, true);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to lock old fd (write 
> byte)");
> +            break;
> +        }
> +        ret = qemu_lock_fd(new_lock_fd, RAW_LOCK_BYTE_NO_OTHER_WRITER, 1, 
> false);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to lock new fd (share 
> byte)");
> +            break;
> +        }
> +        ret = qemu_lock_fd(old_lock_fd, RAW_LOCK_BYTE_WRITE, 1, false);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to downgrade old fd (write 
> byte)");
> +            break;
> +        }
> +        ret = qemu_lock_fd(new_lock_fd, RAW_LOCK_BYTE_WRITE, 1, false);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to lock new fd (write 
> byte)");
> +            break;
> +        }
> +        break;
> +    case RAW_LT_COMMIT:
> +        break;
> +    case RAW_LT_ABORT:
> +        break;

As I said above, as far as I can see, the abort branch is not invoked if
the prepare branch failed. However, it should be, and then it's not
correct for this to be empty: old_lock_fd may still have an exclusive
lock on RAW_LOCK_BYTE_WRITE, this should be downgraded to a shared lock.

> +    }
> +    return ret ? -errno : 0;
> +}
> +
> +static int raw_lt_write_to_read(RawLockTransOp op,
> +                                int old_lock_fd, int new_lock_fd,
> +                                BDRVRawLockMode old_lock,
> +                                BDRVRawLockMode new_lock,
> +                                Error **errp)
> +{
> +    int ret = 0;
> +
> +    assert(old_lock == RAW_L_WRITE);
> +    assert(new_lock == RAW_L_READ);
> +    switch (op) {
> +    case RAW_LT_PREPARE:
> +        ret = qemu_lock_fd(new_lock_fd, RAW_LOCK_BYTE_NO_OTHER_WRITER, 1, 
> false);
> +        if (ret) {
> +            error_setg_errno(errp, errno, "Failed to lock new fd (share 
> byte)");
> +            break;
> +        }
> +        break;
> +    case RAW_LT_COMMIT:
> +        ret = qemu_unlock_fd(old_lock_fd, RAW_LOCK_BYTE_WRITE, 1);
> +        if (ret) {
> +            error_report("Failed to unlock old fd (write byte)");
> +            break;
> +        }

old_lock_fd is closed automatically which should take care of this.
Better safe than sorry, though, of course.

(But there are a couple of places above where you are relying on
old_lock_fd being closed automatically, and also on new_lock_fd being
closed automatically on abort.)

> +        break;
> +    case RAW_LT_ABORT:
> +        break;
> +    }
> +    return ret ? -errno : 0;
> +}
> +
> +static int raw_lt_write_to_write_share_rw(RawLockTransOp op,
> +                                          int old_lock_fd, int new_lock_fd,
> +                                          BDRVRawLockMode old_lock,
> +                                          BDRVRawLockMode new_lock,
> +                                          Error **errp)
> +{
> +    int ret = 0;
> +
> +    assert(old_lock == RAW_L_WRITE);
> +    assert(new_lock == RAW_L_WRITE_SHARE_RW);
> +    switch (op) {
> +    case RAW_LT_PREPARE:

You're not locking the write byte of new_lock here. Maybe it gets
inherited from old_lock_fd automatically, but we don't know whether
new_lock_fd is a dup() of old_lock_fd or whether it is not.

> +        break;
> +    case RAW_LT_COMMIT:
> +        ret = qemu_unlock_fd(old_lock_fd, RAW_LOCK_BYTE_NO_OTHER_WRITER, 1);
> +        if (ret) {
> +            error_report("Failed to unlock old fd (share byte)");
> +            break;
> +        }
> +        break;
> +    case RAW_LT_ABORT:
> +        break;
> +    }
> +    return ret ? -errno : 0;
> +}
> +
> +/**
> + * Transactionally moving between possible locking states is tricky and must 
> be
> + * done carefully. That is mostly because downgrading an exclusive lock to
> + * shared or unlocked is not guaranteed to be revertable. As a result, in 
> such
> + * cases we have to defer the downgraing to "commit", given that no revert 
> will
> + * happen after that point, and that downgrading a lock should never fail.
> + *
> + * On the other hand, upgrading a lock (e.g. from unlocked or shared to
> + * exclusive lock) must happen in "prepare" because it may fail.
> + *
> + * Manage the operation matrix with this state transition table to make
> + * fulfulling above conditions easier.
> + */
> +static const struct RawReopenFuncRecord {
> +    BDRVRawLockMode old_lock;
> +    BDRVRawLockMode new_lock;
> +    RawReopenFunc func;
> +    bool need_lock_fd;
> +    bool close_old_lock_fd;
> +} reopen_functions[] = {
> +
> +    {RAW_L_READ_SHARE_RW, RAW_L_READ_SHARE_RW, raw_lt_nop, false, false},
> +    {RAW_L_READ_SHARE_RW, RAW_L_READ, raw_lt_from_unlock, true},
> +    {RAW_L_READ_SHARE_RW, RAW_L_WRITE_SHARE_RW, raw_lt_from_unlock, true},
> +    {RAW_L_READ_SHARE_RW, RAW_L_WRITE, raw_lt_from_unlock, true},
> +
> +    {RAW_L_READ, RAW_L_READ_SHARE_RW, raw_lt_nop, false, true},
> +    {RAW_L_READ, RAW_L_READ, raw_lt_nop, false, false},
> +    {RAW_L_READ, RAW_L_WRITE_SHARE_RW, raw_lt_read_to_write_share_rw, true},
> +    {RAW_L_READ, RAW_L_WRITE, raw_lt_read_to_write, true},
> +
> +    {RAW_L_WRITE_SHARE_RW, RAW_L_READ_SHARE_RW, raw_lt_nop, false, true},
> +    {RAW_L_WRITE_SHARE_RW, RAW_L_READ, raw_lt_write_share_rw_to_read, true},
> +    {RAW_L_WRITE_SHARE_RW, RAW_L_WRITE_SHARE_RW, raw_lt_nop, false, false},
> +    {RAW_L_WRITE_SHARE_RW, RAW_L_WRITE, raw_lt_write_share_rw_to_write, 
> true},
> +
> +    {RAW_L_WRITE, RAW_L_READ_SHARE_RW, raw_lt_nop, false, true},
> +    {RAW_L_WRITE, RAW_L_READ, raw_lt_write_to_read, true},
> +    {RAW_L_WRITE, RAW_L_WRITE_SHARE_RW, raw_lt_write_to_write_share_rw, 
> true},
> +    {RAW_L_WRITE, RAW_L_WRITE, raw_lt_nop, false, false},
> +};
> +
> +static int raw_reopen_handle_lock(BDRVReopenState *state,
> +                                  RawLockTransOp op,
> +                                  Error **errp)
> +{
> +    BDRVRawReopenState *rs = state->opaque;
> +    BDRVRawState *s = state->bs->opaque;
> +    BDRVRawLockMode old_lock, new_lock;
> +    const struct RawReopenFuncRecord *rec;
> +    int ret;
> +
> +    old_lock = s->cur_lock_mode;
> +    if (qdict_get_try_bool(state->options, "disable-lock", false)) {

Putting a disable_lock into BDRVRawReopenState, setting that in prepare
and then copying it over to s->disable_lock on commit seems better to me.

> +        new_lock = RAW_L_READ_SHARE_RW;
> +    } else {
> +        new_lock = raw_get_lock_mode(state->flags);
> +    }
> +    qdict_del(state->options, "disable-lock");

So you're reading it on prepare, then deleting it, and on commit you're
just getting new_lock from raw_get_lock_mode() because you already
deleted it from state->options? That seems wrong to me.

> +
> +    for (rec = &reopen_functions[0];
> +         rec < &reopen_functions[ARRAY_SIZE(reopen_functions)];
> +         rec++) {
> +        if (rec->old_lock == old_lock && rec->new_lock == new_lock) {
> +            break;
> +        }
> +    }
> +    assert(rec != &reopen_functions[ARRAY_SIZE(reopen_functions)]);
> +
> +    switch (op) {
> +    case RAW_LT_PREPARE:
> +        if (rec->need_lock_fd) {
> +            int lock_flags = rs->open_flags;
> +            if (!(state->flags & BDRV_O_SHARE_RW)) {
> +                lock_flags |= O_RDWR;
> +            }

Again, I think we can and should do without this.

> +            ret = raw_dup_flags(rs->fd, s->open_flags, lock_flags,
> +                                state->bs->filename, errp);
> +            if (ret < 0) {
> +                return ret;
> +            }
> +            rs->lock_fd = ret;
> +        } else {
> +            rs->lock_fd = -1;
> +        }
> +        return rec->func(op, s->lock_fd, rs->lock_fd, old_lock, new_lock, 
> errp);

As I said above somewhere, it might make sense to fall through to
RAW_LT_ABORT if this returns an error.

> +    case RAW_LT_COMMIT:
> +        rec->func(op, s->lock_fd, rs->lock_fd, old_lock, new_lock, errp);

Since you're ignoring the return value, you can also pass &error_abort
(I know it doesn't matter because this function is invoked with errp ==
&error_abort, but it looks cleaner).

> +        if ((rec->need_lock_fd || rec->close_old_lock_fd) && s->lock_fd >= 
> 0) {
> +            qemu_close(s->lock_fd);

I'd propose setting it to -1 afterwards (relevant when
!rec->need_lock_fd && rec->close_old_lock_fd).

> +        }
> +        if (rec->need_lock_fd) {
> +            s->lock_fd = rs->lock_fd;
> +        }
> +        s->cur_lock_mode = new_lock;
> +        break;
> +    case RAW_LT_ABORT:
> +        rec->func(op, s->lock_fd, rs->lock_fd, old_lock, new_lock, errp);

Again, could pass &error_abort instead of errp.

> +        if (rec->need_lock_fd) {
> +            if (rs->lock_fd >= 0) {

Actually, rs->lock_fd >= 0 should be equivalent to rec->need_lock_fd.

> +                qemu_close(rs->lock_fd);
> +                rs->lock_fd = -1;
> +            }
> +        }
> +        break;
> +    }
> +    return 0;
> +}
> +
>  static int raw_reopen_prepare(BDRVReopenState *state,
>                                BlockReopenQueue *queue, Error **errp)
>  {
> @@ -560,61 +1177,24 @@ static int raw_reopen_prepare(BDRVReopenState *state,
>  
>      raw_parse_flags(state->flags, &rs->open_flags);
>  
> -    rs->fd = -1;
> -
> -    int fcntl_flags = O_APPEND | O_NONBLOCK;
> -#ifdef O_NOATIME
> -    fcntl_flags |= O_NOATIME;
> -#endif
> -
> -#ifdef O_ASYNC
> -    /* Not all operating systems have O_ASYNC, and those that don't
> -     * will not let us track the state into rs->open_flags (typically
> -     * you achieve the same effect with an ioctl, for example I_SETSIG
> -     * on Solaris). But we do not use O_ASYNC, so that's fine.
> -     */
> -    assert((s->open_flags & O_ASYNC) == 0);
> -#endif
> -
> -    if ((rs->open_flags & ~fcntl_flags) == (s->open_flags & ~fcntl_flags)) {
> -        /* dup the original fd */
> -        rs->fd = qemu_dup(s->fd);
> -        if (rs->fd >= 0) {
> -            ret = fcntl_setfl(rs->fd, rs->open_flags);
> -            if (ret) {
> -                qemu_close(rs->fd);
> -                rs->fd = -1;
> -            }
> -        }
> +    ret = raw_dup_flags(s->fd, s->open_flags, rs->open_flags,
> +                        state->bs->filename, errp);
> +    if (ret < 0) {
> +        return ret;
>      }
>  
> -    /* If we cannot use fcntl, or fcntl failed, fall back to qemu_open() */
> -    if (rs->fd == -1) {
> -        const char *normalized_filename = state->bs->filename;
> -        ret = raw_normalize_devicepath(&normalized_filename);
> -        if (ret < 0) {
> -            error_setg_errno(errp, -ret, "Could not normalize device path");
> -        } else {
> -            assert(!(rs->open_flags & O_CREAT));
> -            rs->fd = qemu_open(normalized_filename, rs->open_flags);
> -            if (rs->fd == -1) {
> -                error_setg_errno(errp, errno, "Could not reopen file");
> -                ret = -1;
> -            }
> -        }
> -    }
> +    rs->fd = ret;
>  
>      /* Fail already reopen_prepare() if we can't get a working O_DIRECT
>       * alignment with the new fd. */
> -    if (rs->fd != -1) {
> -        raw_probe_alignment(state->bs, rs->fd, &local_err);
> -        if (local_err) {
> -            qemu_close(rs->fd);
> -            rs->fd = -1;
> -            error_propagate(errp, local_err);
> -            ret = -EINVAL;
> -        }
> +    raw_probe_alignment(state->bs, rs->fd, &local_err);
> +    if (local_err) {
> +        qemu_close(rs->fd);
> +        rs->fd = -1;
> +        error_propagate(errp, local_err);
> +        return -EINVAL;
>      }
> +    ret = raw_reopen_handle_lock(state, RAW_LT_PREPARE, errp);

Shouldn't this have the same clean-up routine on error as above, i.e.
close the new FD?

>  
>      return ret;
>  }
> @@ -626,6 +1206,8 @@ static void raw_reopen_commit(BDRVReopenState *state)
>  
>      s->open_flags = rs->open_flags;
>  
> +    raw_reopen_handle_lock(state, RAW_LT_COMMIT, &error_abort);
> +
>      qemu_close(s->fd);
>      s->fd = rs->fd;
>  
> @@ -643,6 +1225,8 @@ static void raw_reopen_abort(BDRVReopenState *state)
>          return;
>      }
>  
> +    raw_reopen_handle_lock(state, RAW_LT_ABORT, &error_abort);
> +
>      if (rs->fd >= 0) {
>          qemu_close(rs->fd);
>          rs->fd = -1;
> @@ -1332,6 +1916,10 @@ static void raw_close(BlockDriverState *bs)
>          qemu_close(s->fd);
>          s->fd = -1;
>      }
> +    if (s->lock_fd >= 0) {
> +        qemu_close(s->lock_fd);
> +        s->lock_fd = -1;
> +    }
>  }
>  
>  static int raw_truncate(BlockDriverState *bs, int64_t offset)
> @@ -1832,6 +2420,27 @@ static int raw_get_info(BlockDriverState *bs, 
> BlockDriverInfo *bdi)

*Resumed review complete* :-)

Max
signature.asc
Description: OpenPGP digital signature
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Qemu-devel] [PATCH 13/14] raw-posix: Implement image locking, Max Reitz, 2016/12/01
- Re: [Qemu-devel] [PATCH 13/14] raw-posix: Implement image locking, Max Reitz <=
Prev by Date: Re: [Qemu-devel] QEMU Advent Calendar 2016
Next by Date: Re: [Qemu-devel] [PATCH 14/14] tests: Add test-image-lock
Previous by thread: Re: [Qemu-devel] [PATCH 13/14] raw-posix: Implement image locking
Next by thread: Re: [Qemu-devel] [PATCH 00/14] block: Image locking series
Index(es):
- Date
- Thread