[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Evaluating Disk IO and Snapshots

From: Juergen Pfennig
Subject: Re: [Qemu-devel] Evaluating Disk IO and Snapshots
Date: Fri, 20 Jan 2006 23:53:34 +0100
User-agent: KMail/1.7.2

Hi Andre
you suggested ...

  While you are at it, have you considered using the LZO libraries
  instead of zlib for compression/decompression speed? Sure, it won't
  compress as much as zlib, but speed improvements should be noticeable.

... sorry. This is a misunderstanding. 

(1) I will not modify qcow and friends. Beware!
(2) The thing works only for the -snapshot file.
(3) The snapshot file uses no compression.
(4) Non-Linux/BSD host would fall-back to qcow.
(5) Yes, a windows implementation would be possible.

Here more details:

The storage for temp data will not rely on sparse files. It will use
two memory mapped temp files, one for index info and one for real
data. I have implemented a simple version of it and am testing it
currently. Speed improvements (IO time) are significant (about 20%).

The zero-memory copy thing ...

There will be a new function for use by ne2000.c, ide.c and friends:

    ptr = bdrv_memory( ...disk..., sector, [read|write|cancel|commit])

In many situations the function can return a pointer into a
mem-mapped region (the windows swap file would be a good example).
This helps to avoid copying data aroud in user-space or between
user-space and kernel. The cancel/commit can be implemented via 
aliasing. The code also helps to combine disk sectors back to pages
without extra cost (windows usually write 4k blocks or larger).

THE PROBLEM: avoiding read before write. I will have a look at the
kernel sources.

Whereas I expect only a 1% winn by the zero-copy stuff, my tests for
another little thing promise a 4% improvment (measured in CPU 
cycles). Or 12.5 ns per IO byte. This is how it works:

OLD CODE (vl.c):
  void *ioport_opaque[MAX_IOPORTS];
  IOPortWriteFunc *ioport_write_table[3][MAX_IOPORTS];
  IOPortWriteFunc *ioport_read_table[3][MAX_IOPORTS];

  void cpu_outl(CPUState *env, int addr, int val)
  {   ioport_write_table[2][addr](ioport_opaque[addr], addr, val);

OLD CODE (ide.c and even worse in ne200.c):
  void writeFunction(void *opaque, unsigned int addr, unsigned int data)
  { IDEState *s = ((IDEState *)opaque)->curr;
     char *p;
     p = s->data_ptr;
     *(unsigned int *)p = data;
     p += 4;
     s->data_ptr = p;
     if (p >= s->data_end) s->end_function();

As you can see repeated port IO produces a lot of overhead. 115 ns per
32-bit word (P4 2.4 GHz CPU).

New Code (vl.c):
  typedef struct PIOInfo {
    /* ... more fields ... */
    IOPortWriteFunc* write;
    void*            opaque;
    char*            data_ptr;
    char*            data_end;
  } PIOInfo;

  PIOInfo*    pio_info_table[MAX_IOPORTS];

  void cpu_outx(CPUState *env, int addr, int val)
    PIOInfo *i = pio_info_table[addr];
    if(i->data_ptr >= i->data_end) // simple call
       i->write(i->opaque, addr, val);
    else {                         // copy call
        *(int *)(i->data_ptr) = val;
        i->data_ptr += 4;

The new code moves the data coying (from ide.c and ne2000.c) into
vl.c. This saves 60 ns per 32-bit word. Some memory is saved,
cache-locality is increased. Async IO implementation gets easier.


(1) For a simple call there is a 7ns penalty compared to the
    current solution.
(2) Until now the ide.c and ne2000.c drivers are very closely
    modelled to the hardware. The c code looks a bit like a 
    circuit diagram (1:1 relation). My proposal adds some
    abstraction. The ide.c driver would give up the "drive
    cache" memory and the ne2000.c driver would 1st fetch
    the (raw) data and then process it.


Yes, it's a bit ugly. For modest speed enhancements a lot of code
is needed. But on the other hand: many small things taken together
can become a big progress (Paul's code generator, dma, async IO...).

I have attached my timing test. Copile it with -03 (-O4 makes no
sense unless you split the code into different files).

Yours Jürgen

Attachment: test.c
Description: Text Data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]