Re: [Qemu-devel] [PATCH] migration: vectorize is_dup_page

From: Avi Kivity
Subject: Re: [Qemu-devel] [PATCH] migration: vectorize is_dup_page
Date: Tue, 20 Dec 2011 17:24:29 +0200
On 12/06/2011 07:25 PM, Paolo Bonzini wrote:
> is_dup_page is already proceeding in 32-bit chunks.  Changing it to 16
> bytes using Altivec or SSE is easy, and provides a noticeable improvement.
> Pierre Riteau measured 30->25 seconds on a 16GB guest, I measured 4.6->3.9
> seconds on a 6GB guest (best of three times for me; dunno for Pierre).
> Both of them are approximately a 15% improvement.
> I tried playing with non-temporal prefetches, but I did not get any
> improvement (though I did get less cache misses, so the patch was doing
> its job).

It's worthwhile anyway IMO.

> +static int is_dup_page(uint8_t *page)
>  {
> -    uint32_t val = ch << 24 | ch << 16 | ch << 8 | ch;
> -    uint32_t *array = (uint32_t *)page;
> +    VECTYPE *p = (VECTYPE *)page;
> +    VECTYPE val = SPLAT(p);

I think you can drop the SPLAT and just compare against zero.  Full page
repeats of anything but zero are unlikely, so we can simplify the code a
bit here.  If we do go with non-temporal loads, it saves an additional miss.

