qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Proposed patch: huge RX speedup for hw/e1000.c


From: Stefano Stabellini
Subject: Re: [Qemu-devel] Proposed patch: huge RX speedup for hw/e1000.c
Date: Thu, 31 May 2012 12:06:24 +0100
User-agent: Alpine 2.00 (DEB 1167 2008-08-23)

On Thu, 31 May 2012, Paolo Bonzini wrote:
> Il 31/05/2012 00:53, Luigi Rizzo ha scritto:
> > The image contains my fast packet generator "pkt-gen" (a stock
> > traffic generator such as netperf etc. is too slow to show the
> > problem). pkt-gen can send about 1Mpps in this configuration using
> > -net netmap in the backend. The qemu process in this case takes 100%
> > CPU. On the receive side, i cannot receive more than 50Kpps, even if i
> > flood the bridge with a a huge amount of traffic. The qemu process stays
> > at 5% cpu or less.
> > 
> > Then i read on the docs in main-loop.h which says that one case where
> > the qemu_notify_event() is needed is when using 
> > qemu_set_fd_handler2(), which is exactly what my backend uses
> > (similar to tap.c)
> 
> The path is a bit involved, but I think Luigi is right.  The docs say
> "Remember to call qemu_notify_event whenever the [return value of the
> fd_read_poll callback] may change from false to true."  Now net/tap.c has
> 
>     static int tap_can_send(void *opaque)
>     {
>         TAPState *s = opaque;
> 
>         return qemu_can_send_packet(&s->nc);
>     }
> 
> and (ignoring VLANs) qemu_can_send_packet is
> 
>     int qemu_can_send_packet(VLANClientState *sender)
>     {
>         if (sender->peer->receive_disabled) {
>             return 0;
>         } else if (sender->peer->info->can_receive &&
>                    !sender->peer->info->can_receive(sender->peer)) {
>             return 0;
>         } else {
>             return 1;
>         }
>     }
> 
> So whenever receive_disabled goes from 0 to 1 or can_receive goes from 0 to 1,
> the _peer_ has to call qemu_notify_event.  In e1000.c we have
> 
>     static bool e1000_has_rxbufs(E1000State *s, size_t total_size)
>     {
>         int bufs;
>         /* Fast-path short packets */
>         if (total_size <= s->rxbuf_size) {
>             return s->mac_reg[RDH] != s->mac_reg[RDT] || !s->check_rxov;
>         }
>         if (s->mac_reg[RDH] < s->mac_reg[RDT]) {
>             bufs = s->mac_reg[RDT] - s->mac_reg[RDH];
>         } else if (s->mac_reg[RDH] > s->mac_reg[RDT] || !s->check_rxov) {
>             bufs = s->mac_reg[RDLEN] /  sizeof(struct e1000_rx_desc) +
>                 s->mac_reg[RDT] - s->mac_reg[RDH];
>         } else {
>             return false;
>         }
>         return total_size <= bufs * s->rxbuf_size;
>     }
> 
>     static int
>     e1000_can_receive(VLANClientState *nc)
>     {
>         E1000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
>     
>         return (s->mac_reg[RCTL] & E1000_RCTL_EN) && e1000_has_rxbufs(s, 1);
>     }
> 
> So as a conservative approximation, you need to fire qemu_notify_event
> whenever you write to RDH, RDT, RDLEN and RCTL, or when check_rxov becomes
> zero.  In practice, only RDT, RCTL and check_rxov matter.  Luigi, does this
> patch work for you?
> 
> diff --git a/hw/e1000.c b/hw/e1000.c
> index 4573f13..0069103 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -295,6 +295,7 @@ set_rx_control(E1000State *s, int index, uint32_t val)
>      s->rxbuf_min_shift = ((val / E1000_RCTL_RDMTS_QUAT) & 3) + 1;
>      DBGOUT(RX, "RCTL: %d, mac_reg[RCTL] = 0x%x\n", s->mac_reg[RDT],
>             s->mac_reg[RCTL]);
> +    qemu_notify_event();
>  }
>  
>  static void
> @@ -922,6 +923,7 @@ set_rdt(E1000State *s, int index, uint32_t val)
>  {
>      s->check_rxov = 0;
>      s->mac_reg[index] = val & 0xffff;
> +    qemu_notify_event();
>  }
>  
>  static void
> 
> 
> RDT is indeed written in the ISR.  In the Linux driver, e1000_clean_rx_irq
> calls adapter->alloc_rx_buf which is e1000_alloc_rx_buffers.  There you
> see this:
> 
>         if (likely(rx_ring->next_to_use != i)) {
>                 rx_ring->next_to_use = i;
>                 if (unlikely(i-- == 0))
>                         i = (rx_ring->count - 1);
> 
>                 /* Force memory writes to complete before letting h/w
>                  * know there are new descriptors to fetch.  (Only
>                  * applicable for weak-ordered memory model archs,
>                  * such as IA-64). */
>                 wmb();
>                 writel(i, hw->hw_addr + rx_ring->rdt);
>         }
> 
> Similarly for all other devices:
> - cadence_gem -> GEM_NWCTRL
> - dp8393x -> SONIC_CR, SONIC_ISR
> - eepro100 -> set_ru_state
> - mcf_fec -> mcf_fec_enable_rx
> - milkymist-minimax2 -> R_STATE0, R_STATE1
> - mipsnet -> MIPSNET_INT_CTL, MIPSNET_RX_DATA_BUFFER
> - ne2000 -> EN0_STARTPG, EN0_STOPPG, E8390_CMD
> - opencores_eth -> TX_BD_NUM, MODER, rx_desc
> - pcnet -> pcnet_start, csr[5]
> - rtl8139 -> RxBufPtr and Cfg9346
> - smc91c111 -> RCR, smc91c111_release_packet
> - spapr_llan -> h_add_logical_lan_buffer
> - stellaris_enet -> RCTL, DATA
> - xgmac -> DMA_CONTROL
> - xilinx_axienet -> rcw[1]
> - xilinx_ethlite -> R_RX_CTRL0
> 
> For Xen I think this is not possible at the moment because it doesn't
> implement rx notification.
 
Why do you say that?
Xen supports the iothread and CONFIG_EVENTFD.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]