[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] migration: vectorize is_dup_page
From: |
Avi Kivity |
Subject: |
Re: [Qemu-devel] [PATCH] migration: vectorize is_dup_page |
Date: |
Tue, 20 Dec 2011 17:24:29 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:8.0) Gecko/20111115 Thunderbird/8.0 |
On 12/06/2011 07:25 PM, Paolo Bonzini wrote:
> is_dup_page is already proceeding in 32-bit chunks. Changing it to 16
> bytes using Altivec or SSE is easy, and provides a noticeable improvement.
> Pierre Riteau measured 30->25 seconds on a 16GB guest, I measured 4.6->3.9
> seconds on a 6GB guest (best of three times for me; dunno for Pierre).
> Both of them are approximately a 15% improvement.
>
> I tried playing with non-temporal prefetches, but I did not get any
> improvement (though I did get less cache misses, so the patch was doing
> its job).
It's worthwhile anyway IMO.
>
> +static int is_dup_page(uint8_t *page)
> {
> - uint32_t val = ch << 24 | ch << 16 | ch << 8 | ch;
> - uint32_t *array = (uint32_t *)page;
> + VECTYPE *p = (VECTYPE *)page;
> + VECTYPE val = SPLAT(p);
>
I think you can drop the SPLAT and just compare against zero. Full page
repeats of anything but zero are unlikely, so we can simplify the code a
bit here. If we do go with non-temporal loads, it saves an additional miss.
--
error compiling committee.c: too many arguments to function