qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization


From: Li, Liang Z
Subject: Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization
Date: Thu, 12 Nov 2015 09:40:18 +0000

> >>> I am very surprised about the live migration performance  result
> >>> when I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics
> >>> to check the zero pages.
> >>
> >> What code were you using?  Remember I suggested using only unsigned
> >> long checks, like
> >>
> >>    unsigned long *p = ...
> >>    if (p[0] || p[1] || p[2] || p[3]
> >>        || memcmp(p+4, p, size - 4 * sizeof(unsigned long)) != 0)
> >>            return BUFFER_NOT_ZERO;
> >>    else
> >>            return BUFFER_ZERO;
> >>
> >
> > I use the following code:
> >
> >
> > bool memeqzero4_paolo(const void *data, size_t length) {
> >      ...
> > }
> 
> The code you used is very generic and not optimized for the kind of data you
> see during migration, hence the existing code in QEMU fares better.
> 

I migrate a 8GB RAM Idle guest,  I think most of it's pages are zero pages.

I use your new code:
-------------------------------------------------
        unsigned long *p = ...
        if (p[0] || p[1] || p[2] || p[3]
            || memcmp(p+4, p, size - 4 * sizeof(unsigned long)) != 0)
                return BUFFER_NOT_ZERO;
        else
                return BUFFER_ZERO;
---------------------------------------------------
and the result is almost the same.  I also tried the check 8, 16 long data at 
the beginning, 
same result.

> >>> The total live migration time increased about
> >>> 8%!   Not decreased.  Although in the unit test your '
> >>> memeqzero4_paolo'  has better performance, any idea?
> >>
> >> You only tested the case of zero pages.  But real pages usually are
> >> not zero, even if they have a few zero bytes at the beginning.  It's
> >> very important to optimize the initial check before the memcmp call.
> >>
> >
> > In the unit test, I only test zero pages too, and the performance of
> 'memeqzero4_paolo' is better.
> > But when merged into QEMU, it caused performance drop. Why?
> 
> Because QEMU is not migrating zero pages only.
> 
> Paolo

reply via email to

[Prev in Thread] Current Thread [Next in Thread]