[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2] migration: hold the BQL during setup
From: |
Fiona Ebner |
Subject: |
Re: [PATCH v2] migration: hold the BQL during setup |
Date: |
Fri, 26 May 2023 17:54:20 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 |
Am 26.05.23 um 15:47 schrieb Fiona Ebner:
> Am 26.05.23 um 12:16 schrieb Juan Quintela:
>> Nak
>>
>> Sometimes it works, and sometimes it hangs.
>
> Sorry, I originally only ran the tests for x86_64 (native for me). I now
> ran into the hang too, with qtest-aarch64/migration-test and
> qtest-i386/migration-test.
>
>> Can you take a look?
>
> Will do!
>
So I took a look at the multifd_send_state->params[$i] and noticed that
the IOChannel c is still NULL and running is still false, while the
name, page_size, etc. have already been initialized.
So it seems like that socket_send_channel_create() did not manage to
execute multifd_new_send_channel_async() yet, telling from the following
in multifd_save_setup():
> p->page_size = qemu_target_page_size();
> p->page_count = page_count;
>
> if (migrate_zero_copy_send()) {
> p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
> } else {
> p->write_flags = 0;
> }
>
> socket_send_channel_create(multifd_new_send_channel_async, p);
> }
I guess the execution of multifd_new_send_channel_async() after
socket_send_channel_create() somehow depends on the main thread doing
something? But the main thread is waiting on the BQL.
Should I introduce an unlocked section around multifd_send_sync_main()
in ram_save_setup() and hope for the best? The tests ran two times
without errors for me afterwards. Is there an easy way to only run the
single problematic test cases again?
There's still the risk there's something else that needs an unlocked
section. In that regard, v1 of the patch is safer, because it doesn't
change which sections are inside or outside the BQL for migration, just
for snapshot.
Best Regards,
Fiona