qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] migration/multifd: close TLS channel before socket finali


From: Zheng Chuan
Subject: Re: [PATCH v2] migration/multifd: close TLS channel before socket finalize
Date: Wed, 11 Nov 2020 15:07:40 +0800
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0

I think i have found it why.

When we create tls client in migration_tls_client_create(), we reference 
tioc->master.
As for main migration thread, it will do dereference after 
migration_channel_connect in socket_outgoing_migration().
As for non-TLS migration, it will do another reference in 
qemu_fopen_channel_output(ioc) of migration_channel_connect().

In a conclusion, we need to dereference the underlying QIOChannelSocket after 
tls handshake for multifd-TLS channel.
The fix patch is sent and waiting for review.
https://www.mail-archive.com/qemu-devel@nongnu.org/msg759110.html

On 2020/11/10 19:56, Zheng Chuan wrote:
> 
> 
> On 2020/11/10 19:01, Daniel P. Berrangé wrote:
>> On Tue, Nov 10, 2020 at 06:45:45PM +0800, Zheng Chuan wrote:
>>>
>>>
>>> On 2020/11/10 18:12, Daniel P. Berrangé wrote:
>>>> On Fri, Nov 06, 2020 at 06:54:54PM +0800, Chuan Zheng wrote:
>>>>> Since we now support tls multifd, when we cancel migration, the TLS
>>>>> sockets will be left as CLOSE-WAIT On Src which results in socket
>>>>> leak.
>>>>> Fix it by closing TLS channel before socket finalize.
>>>>>
>>>>> Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
>>>>> ---
>>>>>  migration/multifd.c | 14 ++++++++++++++
>>>>>  1 file changed, 14 insertions(+)
>>>>>
>>>>> diff --git a/migration/multifd.c b/migration/multifd.c
>>>>> index 68b171f..a6838dc 100644
>>>>> --- a/migration/multifd.c
>>>>> +++ b/migration/multifd.c
>>>>> @@ -523,6 +523,19 @@ static void multifd_send_terminate_threads(Error 
>>>>> *err)
>>>>>      }
>>>>>  }
>>>>>  
>>>>> +static void multifd_tls_socket_close(QIOChannel *ioc, Error *err)
>>>>> +{
>>>>> +    if (ioc &&
>>>>> +        object_dynamic_cast(OBJECT(ioc),
>>>>> +                            TYPE_QIO_CHANNEL_TLS)) {
>>>>> +        /*
>>>>> +         * TLS channel is special, we need close it before
>>>>> +         * socket finalize.
>>>>> +         */
>>>>> +        qio_channel_close(ioc, &err);
>>>>> +    }
>>>>> +}
>>>>
>>>> This doesn't feel quite right to me.  Calling qio_channel_close will close
>>>> both the TLS layer, and the underlying QIOChannelSocket. If the latter
>>>> is safe to do, then we don't need the object_dynamic_cast() check, we can
>>>> do it unconditionally whether we're using TLS or not.
>>>>
>>>> Having said that, I'm not sure if we actually want to be using
>>>> qio_channel_close or not ?
>>>>
>>>> I would have expected that there is already code somewhere else in the
>>>> migration layer that is closing these multifd channels, but I can't
>>>> actually find where that happens right now.  Assuming that code does
>>>> exist though, qio_channel_shutdown(ioc, BOTH) feels like the right
>>>> answer to unblock waiting I/O ops.
>>>>
>>> Hi, Daniel.
>>> Actually, I have tried to use qio_channel_shutdown at the same place,
>>> but it seems not work right.
>>> the socket connection is closed by observing through 'ss' command but
>>> the socket fds in /proc/$(qemu pid)/fd are still residual.
>>>
>>> The underlying QIOChannelSocket will be closed by
>>> qio_channel_socket_finalize() through object_unref(QIOChannel) later
>>> in socket_send_channel_destroy(),
>>> does that means it is safe to close both of TLS and tcp socket?
>>
>> Hmm, that makes me even more confused, because the object_unref
>> should be calling qio_channel_close() already.
>>
>> eg with your patch we have:
>>
>>        multifd_tls_socket_close(p->c, NULL);
>>            -> qio_channel_close(p->c)
>>              -> qio_channel_tls_close(p->c)
>>                      -> qio_channel_close(master)
>>
>>        socket_send_channel_destroy(p->c)
>>            -> object_unref(p->c)
>>               -> qio_channel_tls_socket_finalize(p->c)
>>                    -> object_unref(master)
>>                            -> qio_channel_close(master)
>>
>> so the multifd_tls_socket_close should not be doing anything
>> at all *assuming* we releasing the last reference in our
>> object_unref call.
>>
>> Given what you describe, I think we are *not* releasing the
>> last reference. There is an active reference being held
>> somewhere else, and that is preventing the QIOChannelSocket
>> from being freed and thus the socket remains open.
>>
>> If that extra active reference is a bug, then we have a memory
>> leak of the QIOChannelSocket object, that needs fixing somewhere.
>>
>> If that extra active reference is intentional, then we do indeed
>> need to explicitly close the socket. That is possibly better
>> handled by putting a qio_channel_close() call into the
>> socket_send_channel_destroy() method.
>>
>> I wonder if we're leaking a reference to the underlying QIOChannelSocket,
>> when we create the QIOChannelTLS wrapper ? That could explain a problem
>> that only happens when using TLS.
>>
> Aha, you are right!
> The QIOChannelSocket is added by an extra reference.
> 
> Thread 1 "qemu-system-aar" hit Breakpoint 1, socket_send_channel_destroy (
>     send=0xaaaabea527f0) at migration/socket.c:44
> 44    migration/socket.c: No such file or directory.
> (gdb) p *((QIOChannelTLS)*send)->master
> $5 = {parent = {class = 0xaaaabc690c90, free = 0xffff9bd16c40 <g_free>,
>     properties = 0xffff61a04de0, ref = 2, parent = 0x0}, features = 2, name = 
> 0x0,
>   ctx = 0x0, read_coroutine = 0x0, write_coroutine = 0x0}
> (gdb) p (QIOChannelTLS)*send
> $6 = {parent = {parent = {class = 0xaaaabc6430c0, free = 0xffff9bd16c40 
> <g_free>,
>       properties = 0xffff61a04f00, ref = 1, parent = 0x0}, features = 2,
>     name = 0xaaaabdd81290 "multifd-tls-outgoing", ctx = 0x0, read_coroutine = 
> 0x0,
>     write_coroutine = 0x0}, master = 0xaaaabe350e90, session = 
> 0xaaaabcf1ead0, shutdown = 0}
> (gdb) p *((QIOChannelTLS)*send)->master
> $7 = {parent = {class = 0xaaaabc690c90, free = 0xffff9bd16c40 <g_free>,
>     properties = 0xffff61a04de0, ref = 2, parent = 0x0}, features = 2, name = 
> 0x0,
>   ctx = 0x0, read_coroutine = 0x0, write_coroutine = 0x0}
> 
> I'll make it further to see where we do this thing...
> 
>> Regards,
>> Daniel
>>
> 

-- 
Regards.
Chuan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]