Re: [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions

From:	Mark Cave-Ayland
Subject:	Re: [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations
Date:	Mon, 17 Dec 2018 18:49:47 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0

On 17/12/2018 17:39, Richard Henderson wrote:

> On 12/17/18 4:23 AM, Mark Cave-Ayland wrote:
>> NOTE: there are a lot of instructions that cannot (yet) be optimised to use 
>> TCG vector
>> operations, however it struck me that there may be some potential for 
>> converting
>> saturating add/sub and cmp instructions if there were a mechanism to return 
>> a set of
>> flags indicating the result of the saturation/comparison.
> 
> There are also a lot of instructions that can be converted, but aren't:
> 
> * vspltis[bhw] can use tcg_gen_gvec_dup{8,16,32}i.
> 
> * vsplt{b,h,w} can use tcg_gen_gvec_dup_mem.
> 
>   Note that you'll need something like vec_reg_offset from
>   target/arm/translate-a64.h to compute the offset of the
>   specific byte/word/long from which we are to splat.

Oh okay, thanks for the hints - I remember thinking that I couldn't much with 
splat,
but I'll go and look again.

> * vmr should be handled by having tcg_gen_gvec_or notice aofs == bofs.
>   For ARM, we do special case this during translation.
>   But since tcg/tcg-op.c does these things for tcg_gen_or_i64,
>   we should probably handle the same set of transformations.
> 
> * vnot would need to be handled by actually adding a tcg_gen_gvec_nor
>   and then also noticing aofs == bofs.

And I'll revisit these ones too.

> For saturation, I think the easiest thing to do is represent SAT as a
> ppc_avr_t.  We notice saturation by also computing normal arithmetic and
> comparing to see if they differ.  E.g.
> 
>     tcg_gen_gvec_add(vece, offsetof_avr_tmp,
>                      offsetof(ra), offsetof(rb), 16, 16);
>     tcg_gen_gvec_ssadd(vece, offsetof(rt),
>                        offsetof(ra), offsetof(rb), 16, 16);
>     tcg_gen_gvec_cmp(TCG_COND_NE, vece, offsetof_avr_tmp,
>                      offsetof_avr_tmp, offsetof(rt), 16, 16);
>     tcg_gen_gvec_or(vece, offsetof_avr_sat, offsetof_avr_sat,
>                     offsetof_avr_tmp, 16, 16);
> 
> You only need to convert the ppc_avr_t to a single bit when reading VSCR.

I actually had a PoC that looked somewhat similar to this, except that I 
abandoned
the idea as I thought that the penalty of doing the add twice (plus comparison) 
would
slow down everything by several orders of magnitude for backends that didn't 
support
vector instructions. What's the best way to handle this?

> For comparisons... that's tricky.  I wonder if there's anything better than
> 
>     tcg_gen_gvec_cmp(TCG_COND_FOO, vece, offsetof(rt),
>                      offsetof(ra), offsetof(rb), 16, 16);
>     if (rc) {
>         TCGv_i64 hi, lo, t, f;
> 
>         tcg_gen_ld_i64(hi, cpu_env, offsetof(rt));
>         tcg_gen_ld_i64(lo, cpu_env, offsetof(rt) + 8);
> 
>         tcg_gen_and_i64(t, hi, lo);
>         tcg_gen_or_i64(f, hi, lo);
>         tcg_gen_setcondi_i64(TCG_COND_EQ, t, t, -1);
>         tcg_gen_setcondi_i64(TCG_COND_EQ, f, f, 0);
> 
>         // truncate to i32, shift, or, and set to cr6.
>     }

Certainly I can look at this approach, but again my concern is that we end up 
heavily
penalising the backends without vector instruction support :/


ATB,

Mark.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC PATCH v2 4/9] target/ppc: delay writeback of avr{l, h} during lvx instruction, (continued)
- [Qemu-devel] [RFC PATCH v2 6/9] target/ppc: merge ppc_vsr_t and ppc_avr_t union types, Mark Cave-Ayland, 2018/12/17
  - Re: [Qemu-devel] [RFC PATCH v2 6/9] target/ppc: merge ppc_vsr_t and ppc_avr_t union types, Richard Henderson, 2018/12/17
- [Qemu-devel] [RFC PATCH v2 3/9] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR register access, Mark Cave-Ayland, 2018/12/17
  - Re: [Qemu-devel] [RFC PATCH v2 3/9] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR register access, Richard Henderson, 2018/12/17
- [Qemu-devel] [RFC PATCH v2 7/9] target/ppc: move FP and VMX registers into aligned vsr register array, Mark Cave-Ayland, 2018/12/17
  - Re: [Qemu-devel] [RFC PATCH v2 7/9] target/ppc: move FP and VMX registers into aligned vsr register array, Richard Henderson, 2018/12/17
- [Qemu-devel] [RFC PATCH v2 9/9] target/ppc: convert vaddu[b, h, w, d] and vsubu[b, h, w, d] over to use vector operations, Mark Cave-Ayland, 2018/12/17
- [Qemu-devel] [RFC PATCH v2 8/9] target/ppc: convert VMX logical instructions to use vector operations, Mark Cave-Ayland, 2018/12/17
- Re: [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations, Richard Henderson, 2018/12/17
  - Re: [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations, Mark Cave-Ayland <=

Prev by Date: Re: [Qemu-devel] [PATCH v2 3/6] pvrdma: check number of pages when creating rings
Next by Date: Re: [Qemu-devel] [PATCH v2 0/4] Support separate -firmware and -kernel
Previous by thread: Re: [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations
Next by thread: Re: [Qemu-devel] [PATCH v3 1/2] hw: pc: use TYPE_XXX instead of constant strings
Index(es):
- Date
- Thread