[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 4/5] migration: implement bi-directional RDMA
From: |
858585 jemmy |
Subject: |
Re: [Qemu-devel] [PATCH v2 4/5] migration: implement bi-directional RDMA QIOChannel |
Date: |
Sat, 28 Apr 2018 12:16:36 +0800 |
On Fri, Apr 27, 2018 at 5:16 PM, Daniel P. Berrangé <address@hidden> wrote:
> On Fri, Apr 27, 2018 at 03:56:38PM +0800, 858585 jemmy wrote:
>> On Fri, Apr 27, 2018 at 1:36 AM, Dr. David Alan Gilbert
>> <address@hidden> wrote:
>> > * Lidong Chen (address@hidden) wrote:
>> >> This patch implements bi-directional RDMA QIOChannel. Because different
>> >> threads may access RDMAQIOChannel concurrently, this patch use RCU to
>> >> protect it.
>> >>
>> >> Signed-off-by: Lidong Chen <address@hidden>
>> >
>> > I'm a bit confused by this.
>> >
>> > I can see it's adding RCU to protect the rdma structures against
>> > deletion from multiple threads; that I'm OK with in principal; is that
>> > the only locking we need? (I guess the two directions are actually
>> > separate RDMAContext's so maybe).
>>
>> The qio_channel_rdma_close maybe invoked by migration thread and
>> return path thread
>> concurrently, so I use a mutex to protect it.
>
> Hmm, that is not good - concurrent threads calling close must not be
> allowed to happen even with non-RDMA I/O chanels.
>
> For example, with the QIOChannelSocket, one thread can call close
> which sets the fd = -1, another thread can race with this and either
> end up calling close again on the same FD or calling close on -1.
> Either way the second thread will get an error from close() when
> it should have skipped the close() and returned success. Perhaps
> migration gets lucky and this doesn't result in it being marked
> as failed, but it is still not good.
>
> So only one thread should be calling close().
for live migration, source qemu invokes qemu_fclose in different
threads, include main thread,
migration thread, return path thread.
destination qemu invokes qemu_fclose in main thread, listen thread and
COLO incoming thread.
so I prefer to add a lock for QEMUFile struct, like this:
int qemu_fclose(QEMUFile *f)
{
int ret;
qemu_fflush(f);
ret = qemu_file_get_error(f);
if (f->ops->close) {
+ qemu_mutex_lock(&f->lock);
int ret2 = f->ops->close(f->opaque);
+ qemu_mutex_unlock(&f->lock);
if (ret >= 0) {
ret = ret2;
}
}
/* If any error was spotted before closing, we should report it
* instead of the close() return value.
*/
if (f->last_error) {
ret = f->last_error;
}
g_free(f);
trace_qemu_file_fclose();
return ret;
}
Any suggestion?
>
>> If one thread invoke qio_channel_rdma_writev, another thread invokes
>> qio_channel_rdma_readv,
>> two threads will use separate RDMAContext, so it does not need a lock.
>>
>> If two threads invoke qio_channel_rdma_writev concurrently, it will
>> need a lock to protect.
>> but I find source qemu migration thread only invoke
>> qio_channel_rdma_writev, the return path
>> thread only invokes qio_channel_rdma_readv.
>
> QIOChannel impls are only intended to cope with a single thread doing
> I/O in each direction. If you have two threads needing to read, or
> two threads needing to write, the layer above should provide locking
> to ensure correct ordering of I/O oprations.
yes, so I think RCU is enough, we do not need more lock.
>
>> The destination qemu only invoked qio_channel_rdma_readv by main
>> thread before postcopy and or
>> listen thread after postcopy.
>>
>> The destination qemu have already protected it by using
>> qemu_mutex_lock(&mis->rp_mutex) when writing data to
>> source qemu.
>>
>> But should we use qemu_mutex_lock to protect qio_channel_rdma_writev
>> and qio_channel_rdma_readv?
>> to avoid some change in future invoke qio_channel_rdma_writev or
>> qio_channel_rdma_readv concurrently?
>
>>
>> >
>> > But is there nothing else to make the QIOChannel bidirectional?
>> >
>> > Also, a lot seems dependent on listen_id, can you explain how that's
>> > being used.
>>
>> The destination qemu is server side, so listen_id is not zero. the
>> source qemu is client side,
>> the listen_id is zero.
>> I use listen_id to determine whether qemu is destination or source.
>>
>> for the destination qemu, if write data to source, it need use the
>> return_path rdma, like this:
>> if (rdma->listen_id) {
>> rdma = rdma->return_path;
>> }
>>
>> for the source qemu, if read data from destination, it also need use
>> the return_path rdma.
>> if (!rdma->listen_id) {
>> rdma = rdma->return_path;
>> }
>
> This feels uncessarily complex to me. Why not just change QIOCHannelRMDA
> struct, so that it has 2 RDMA context pointers eg
>
> struct QIOChannelRDMA {
> QIOChannel parent;
> RDMAContext *rdmain;
> RDMAContext *rdmaout;
> QEMUFile *file;
> bool blocking; /* XXX we don't actually honour this yet */
> };
The reason is the parameter of some function is RDMAContext.
like qemu_rdma_accept, rdma_start_outgoing_migration.
It's easier to implement return path.
so I should add some comment to the code.
>
>
>
> Regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
- Re: [Qemu-devel] [PATCH v2 3/5] migration: remove unnecessary variables len in QIOChannelRDMA, (continued)