qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 4/5] migration: implement bi-directional RDMA


From: 858585 jemmy
Subject: Re: [Qemu-devel] [PATCH v2 4/5] migration: implement bi-directional RDMA QIOChannel
Date: Sat, 28 Apr 2018 12:16:36 +0800

On Fri, Apr 27, 2018 at 5:16 PM, Daniel P. Berrangé <address@hidden> wrote:
> On Fri, Apr 27, 2018 at 03:56:38PM +0800, 858585 jemmy wrote:
>> On Fri, Apr 27, 2018 at 1:36 AM, Dr. David Alan Gilbert
>> <address@hidden> wrote:
>> > * Lidong Chen (address@hidden) wrote:
>> >> This patch implements bi-directional RDMA QIOChannel. Because different
>> >> threads may access RDMAQIOChannel concurrently, this patch use RCU to 
>> >> protect it.
>> >>
>> >> Signed-off-by: Lidong Chen <address@hidden>
>> >
>> > I'm a bit confused by this.
>> >
>> > I can see it's adding RCU to protect the rdma structures against
>> > deletion from multiple threads; that I'm OK with in principal; is that
>> > the only locking we need? (I guess the two directions are actually
>> > separate RDMAContext's so maybe).
>>
>> The qio_channel_rdma_close maybe invoked by migration thread and
>> return path thread
>> concurrently, so I use a mutex to protect it.
>
> Hmm, that is not good - concurrent threads calling close must not be
> allowed to happen even with non-RDMA I/O chanels.
>
> For example, with the QIOChannelSocket, one thread can call close
> which sets the fd = -1, another thread can race with this and either
> end up calling close again on the same FD or calling close on -1.
> Either way the second thread will get an error from close() when
> it should have skipped the close() and returned success. Perhaps
> migration gets lucky and this doesn't result in it being marked
> as failed, but it is still not good.
>
> So only one thread should be calling close().

for live migration, source qemu invokes qemu_fclose in different
threads, include main thread,
migration thread, return path thread.
destination qemu invokes qemu_fclose in main thread, listen thread and
COLO incoming thread.

so I prefer to add a lock for QEMUFile struct, like this:

int qemu_fclose(QEMUFile *f)
{
    int ret;
    qemu_fflush(f);
    ret = qemu_file_get_error(f);

    if (f->ops->close) {
+        qemu_mutex_lock(&f->lock);
        int ret2 = f->ops->close(f->opaque);
+       qemu_mutex_unlock(&f->lock);
        if (ret >= 0) {
            ret = ret2;
        }
    }
    /* If any error was spotted before closing, we should report it
     * instead of the close() return value.
     */
    if (f->last_error) {
        ret = f->last_error;
    }
    g_free(f);
    trace_qemu_file_fclose();
    return ret;
}

Any suggestion?

>
>> If one thread invoke qio_channel_rdma_writev, another thread invokes
>> qio_channel_rdma_readv,
>> two threads will use separate RDMAContext, so it does not need a lock.
>>
>> If two threads invoke qio_channel_rdma_writev concurrently, it will
>> need a lock to protect.
>> but I find source qemu migration thread only invoke
>> qio_channel_rdma_writev, the return path
>> thread only invokes qio_channel_rdma_readv.
>
> QIOChannel impls are only intended to cope with a single thread doing
> I/O in each direction. If you have two threads needing to read, or
> two threads needing to write, the layer above should provide locking
> to ensure correct ordering  of I/O oprations.

yes, so I think RCU is enough, we do not need more lock.

>
>> The destination qemu only invoked qio_channel_rdma_readv by main
>> thread before postcopy and or
>> listen thread after postcopy.
>>
>> The destination qemu have already protected it by using
>> qemu_mutex_lock(&mis->rp_mutex) when writing data to
>> source qemu.
>>
>> But should we use qemu_mutex_lock to protect qio_channel_rdma_writev
>> and qio_channel_rdma_readv?
>> to avoid some change in future invoke qio_channel_rdma_writev or
>> qio_channel_rdma_readv concurrently?
>
>>
>> >
>> > But is there nothing else to make the QIOChannel bidirectional?
>> >
>> > Also, a lot seems dependent on listen_id, can you explain how that's
>> > being used.
>>
>> The destination qemu is server side, so listen_id is not zero. the
>> source qemu is client side,
>> the listen_id is zero.
>> I use listen_id to determine whether qemu is destination or source.
>>
>> for the destination qemu, if write data to source, it need use the
>> return_path rdma, like this:
>>     if (rdma->listen_id) {
>>         rdma = rdma->return_path;
>>     }
>>
>> for the source qemu, if read data from destination, it also need use
>> the return_path rdma.
>>     if (!rdma->listen_id) {
>>         rdma = rdma->return_path;
>>     }
>
> This feels uncessarily complex to me. Why not just change QIOCHannelRMDA
> struct, so that it has 2 RDMA context pointers eg
>
> struct QIOChannelRDMA {
>     QIOChannel parent;
>     RDMAContext *rdmain;
>     RDMAContext *rdmaout;
>     QEMUFile *file;
>     bool blocking; /* XXX we don't actually honour this yet */
> };

The reason is the parameter of some function is RDMAContext.
like qemu_rdma_accept, rdma_start_outgoing_migration.
It's easier to implement return path.

so I should add some comment to the code.

>
>
>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



reply via email to

[Prev in Thread] Current Thread [Next in Thread]