qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Segfault using qemu-system-arm in smc91c111


From: Peter Crosthwaite
Subject: Re: [Qemu-devel] Segfault using qemu-system-arm in smc91c111
Date: Sun, 6 Sep 2015 11:37:05 -0700

On Sun, Sep 6, 2015 at 7:21 AM, Richard Purdie
<address@hidden> wrote:
> On Sat, 2015-09-05 at 13:30 -0700, Peter Crosthwaite wrote:
>> On Fri, Sep 4, 2015 at 10:30 AM, Peter Maydell <address@hidden> wrote:
>> > On 4 September 2015 at 18:20, Richard Purdie
>> > <address@hidden> wrote:
>> >> On Fri, 2015-09-04 at 13:43 +0100, Richard Purdie wrote:
>> >>> On Fri, 2015-09-04 at 12:31 +0100, Peter Maydell wrote:
>> >>> > On 4 September 2015 at 12:24, Richard Purdie
>> >>> > <address@hidden> wrote:
>> >>> > > So just based on that, yes, seems that the rx_fifo looks to be
>> >>> > > overrunning. I can add the asserts but I think it would just confirm
>> >>> > > this.
>> >>> >
>> >>> > Yes, the point of adding assertions is to confirm a hypothesis.
>> >>>
>> >>> I've now confirmed that it does indeed trigger the assert in
>> >>> smc91c111_receive().
>> >>
>> >> I just tried an experiment where I put:
>> >>
>> >>     if (s->rx_fifo_len >= NUM_PACKETS)
>> >>         return -1;
>> >>
>> >> into smc91c111_receive() and my reproducer stops reproducing the
>> >> problem.
>>
>> Does it just stop the crash or does it eliminate the problem
>> completely with a fully now-working network?
>
> It stops the crash, the network works great.
>
>> >> I also noticed can_receive() could also have a check on buffer
>> >> availability. Would one of these changes be the correct fix here?
>> >
>> > The interesting question is why smc91c111_allocate_packet() doesn't
>> > fail in this situation. We only have NUM_PACKETS worth of storage,
>> > shared between the tx and rx buffers, so how could we both have
>> > already filled the rx_fifo and have a spare packet for the allocate
>> > function to return?
>>
>> Maybe this:
>>
>>             case 5: /* Release.  */
>>                 smc91c111_release_packet(s, s->packet_num);
>>                 break;
>>
>> The guest is able to free an allocated packet without the accompanying
>> pop of tx/rx fifo. This may suggest some sort of guest error?
>>
>> The fix depends on the behaviour of the real hardware. If that MMIO op
>> is supposed to dequeue the corresponding queue entry then we may need
>> to patch that logic to do search the queues and dequeue it. Otherwise
>> we need to find out the genuine length of the rx queue, and clamp it
>> without something like Richards patch. There are a few other bits and
>> pieces that suggest the guest can have independent control of the
>> queues and allocated buffers but i'm confused as to how the rx fifo
>> length can get up to 10 in any case.
>
> I think I have a handle on what is going on. smc91c111_release_packet()
> changes s->allocated() but not rx_fifo. can_receive() only looks at
> s->allocated. We can trigger new network packets to arrive from
> smc91c111_release_packet() which calls qemu_flush_queued_packets()
> *before* we change rx_fifo and this can loop.
>
> The patch below which explicitly orders the qemu_flush_queued_packets()
> call resolved the test case I was able to reproduce this problem in.
>
> So there are three ways to fix this, either can_receive() needs to check
> both s->allocated() and rx_fifo,

This is probably the winner for me.

> or the code is more explicit about when
> qemu_flush_queued_packets() is called (as per my patch below), or the
> case 4 where smc91c111_release_packet() and then
> smc91c111_pop_rx_fifo(s) is called is reversed. I also tested the latter
> which also works, albeit with more ugly code.
>
> The problem is much more reproducible with the assert btw, booting a
> qemu image with this and hitting the network interface with scp of a few
> large files is usually enough.
>
> So which patch would be preferred? :)
>
> Cheers,
>
> Richard
>
>
>
> Index: qemu-2.4.0/hw/net/smc91c111.c
> ===================================================================
> --- qemu-2.4.0.orig/hw/net/smc91c111.c
> +++ qemu-2.4.0/hw/net/smc91c111.c
> @@ -185,7 +185,6 @@ static void smc91c111_release_packet(smc
>      s->allocated &= ~(1 << packet);
>      if (s->tx_alloc == 0x80)
>          smc91c111_tx_alloc(s);
> -    qemu_flush_queued_packets(qemu_get_queue(s->nic));
>  }
>
>  /* Flush the TX FIFO.  */
> @@ -237,9 +236,11 @@ static void smc91c111_do_tx(smc91c111_st
>              }
>          }
>  #endif
> -        if (s->ctr & CTR_AUTO_RELEASE)
> +        if (s->ctr & CTR_AUTO_RELEASE) {
>              /* Race?  */
>              smc91c111_release_packet(s, packetnum);
> +            qemu_flush_queued_packets(qemu_get_queue(s->nic));
> +        }
>          else if (s->tx_fifo_done_len < NUM_PACKETS)
>              s->tx_fifo_done[s->tx_fifo_done_len++] = packetnum;
>          qemu_send_packet(qemu_get_queue(s->nic), p, len);
> @@ -379,9 +380,11 @@ static void smc91c111_writeb(void *opaqu
>                      smc91c111_release_packet(s, s->rx_fifo[0]);
>                  }
>                  smc91c111_pop_rx_fifo(s);
> +                qemu_flush_queued_packets(qemu_get_queue(s->nic));
>                  break;
>              case 5: /* Release.  */
>                  smc91c111_release_packet(s, s->packet_num);
> +                qemu_flush_queued_packets(qemu_get_queue(s->nic));

This could still cause the same issue here though I think. The guest
can release first (case 5) without a corresponding fifo pop (case 3),
which actually means that the first patch idea might be the winner as
it catches this issue too. What is supposed to happen if the guest
does a case 5 while the rx_fifo is full without a matching case 3, and
then a packet arrives on the wire?

Regards,
Peter

>                  break;
>              case 6: /* Add to TX FIFO.  */
>                  smc91c111_queue_tx(s, s->packet_num);
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]