qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_m


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
Date: Sun, 14 Aug 2016 05:51:06 +0300

On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
> 
> On 12/08/16 12:08 pm, "Fam Zheng" <address@hidden> wrote:
> 
> 
> 
> 
> 
> >On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
> >> From: Prerna Saxena <address@hidden>
> >> 
> >> The set_mem_table command currently does not seek a reply. Hence, there is
> >> no easy way for a remote application to notify to QEMU when it finished
> >> setting up memory, or if there were errors doing so.
> >> 
> >> As an example:
> >> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
> >> application). SET_MEM_TABLE does not require a reply according to the spec.
> >> (2) Qemu commits the memory to the guest.
> >> (3) Guest issues an I/O operation over a new memory region which was 
> >> configured on (1).
> >> (4) The application has not yet remapped the memory, but it sees the I/O 
> >> request.
> >> (5) The application cannot satisfy the request because it does not know 
> >> about those GPAs.
> >> 
> >> While a guaranteed fix would require a protocol extension (committed 
> >> separately),
> >> a best-effort workaround for existing applications is to send a 
> >> GET_FEATURES
> >> message before completing the vhost_user_set_mem_table() call.
> >> Since GET_FEATURES requires a reply, an application that processes 
> >> vhost-user
> >> messages synchronously would probably have completed the SET_MEM_TABLE 
> >> before replying.
> >> 
> >> Signed-off-by: Prerna Saxena <address@hidden>
> >> Reviewed-by: Michael S. Tsirkin <address@hidden>
> >> Signed-off-by: Michael S. Tsirkin <address@hidden>
> >
> >Sporadic hangs are seen with test-vhost-user after this patch:
> >
> >https://travis-ci.org/qemu/qemu/builds
> >
> >Reverting seems to fix it for me.
> >
> >Is this a known problem?
> >
> >Fam
> 
> Hi Fam,
> Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my 
> Centos 6 environment, so missed this.
> I am setting up the docker test env to repro this, but I think I can guess 
> the problem :
> 
> In tests/vhost-user-test.c: 
> 
> static void chr_read(void *opaque, const uint8_t *buf, int size)
> {
> ..[snip]..
> 
>     case VHOST_USER_SET_MEM_TABLE:
>        /* received the mem table */
>        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
>        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
> 
> 
>        /* signal the test that it can continue */
>        g_cond_signal(&s->data_cond);
>        break;
> ..[snip]..
> }
> 
> 
> The test seems to be marked complete as soon as mem_table is copied. 
> However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost 
> command implementation with qemu. SET_MEM_TABLE now sends out a new message 
> GET_FEATURES, and the call is only completed once it receives features from 
> the remote application. (or the test framework, as is the case here.)

Hmm but why does it matter that data_cond is woken up?


> While the test itself can be modified (Do not signal completion until we’ve 
> sent a follow-up response to GET_FEATURES), I am now wondering if this patch 
> may break existing vhost applications too ? If so, reverting it possibly 
> better.

What bothers me is that the new feature might cause the same
issue once we enable it in the test.

How about a patch to tests/vhost-user-test.c adding the new
protocol feature? I would be quite interested to see what
is going on with it.


> What confuses me is why it doesn’t fail all the time, but only about 20% to 
> 30% time as Fam reports. 

And succeeds every time on my systems :(

> 
> Thoughts : Michael, Fam, MarcAndre ?
> 
> Regards,
> Prerna 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]