[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] target/i386: fix packusdw in-place operation
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [PATCH] target/i386: fix packusdw in-place operation |
Date: |
Thu, 10 Aug 2017 11:30:19 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 |
On 10/08/2017 02:24, Joseph Myers wrote:
> The SSE4.1 packusdw instruction combines source and destination
> vectors of signed 32-bit integers into a single vector of unsigned
> 16-bit integers, with unsigned saturation. When the source and
> destination are the same register, this means each 32-bit element of
> that register is used twice as an input, to produce two of the 16-bit
> output elements, and so if the operation is carried out
> element-by-element in-place, no matter what the order in which it is
> applied to the elements, the first element's operation will overwrite
> some future input. The helper for packssdw avoids this issue by
> computing the result in a local temporary and copying it to the
> destination at the end; this patch fixes the packusdw helper to do
> likewise. This fixes three gcc test failures in my GCC 6-based
> testing.
>
> Signed-off-by: Joseph Myers <address@hidden>
>
> ---
>
> diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
> index 16509d0..05b1701 100644
> --- a/target/i386/ops_sse.h
> +++ b/target/i386/ops_sse.h
> @@ -1655,14 +1655,17 @@ SSE_HELPER_Q(helper_pcmpeqq, FCMPEQQ)
>
> void glue(helper_packusdw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
> {
> - d->W(0) = satuw((int32_t) d->L(0));
> - d->W(1) = satuw((int32_t) d->L(1));
> - d->W(2) = satuw((int32_t) d->L(2));
> - d->W(3) = satuw((int32_t) d->L(3));
> - d->W(4) = satuw((int32_t) s->L(0));
> - d->W(5) = satuw((int32_t) s->L(1));
> - d->W(6) = satuw((int32_t) s->L(2));
> - d->W(7) = satuw((int32_t) s->L(3));
> + Reg r;
> +
> + r.W(0) = satuw((int32_t) d->L(0));
> + r.W(1) = satuw((int32_t) d->L(1));
> + r.W(2) = satuw((int32_t) d->L(2));
> + r.W(3) = satuw((int32_t) d->L(3));
> + r.W(4) = satuw((int32_t) s->L(0));
> + r.W(5) = satuw((int32_t) s->L(1));
> + r.W(6) = satuw((int32_t) s->L(2));
> + r.W(7) = satuw((int32_t) s->L(3));
> + *d = r;
> }
>
> #define FMINSB(d, s) MIN((int8_t)d, (int8_t)s)
>
Queued, thanks.
Paolo