[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA i
From: |
Aleksandar Markovic |
Subject: |
Re: [Qemu-devel] [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions for MIPS big endian host |
Date: |
Tue, 26 Mar 2019 11:26:55 +0000 |
> From: Aleksandar Markovic
> Subject: Re: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA
> instructions for MIPS big endian host
>
> > From: Mateja Marjanovic <address@hidden>
> > Subject: Re: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA
> > instructions for MIPS big endian host
> > On 25.3.19. 22:21, Aleksandar Markovic wrote:
> >> From: Mateja Marjanovic <address@hidden>
> >> Subject: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA
> >> instructions for MIPS big endian host
> >> Please split this patch into one for load instructions and another for
> >> store instructions.
> >
> > I will do that.
> >
> >> I don't think the variable "big_endian" is needed, you should better
> >> resolve
> >> all differences wrt endian via "#if defined(HOST_WORDS_BIGENDIAN)",
> >> like you did in other patches. It is important to be consistent.
> >
> > I think you can't have a preprocessing statement inside a macro,
> > so I did the thing that seemed most similar to that.
> >
>
> There should be some equivalent way of doing that involving an argument
> to the macro, however, in the light of your recent work on MSA optimization,
> let's split/demacro and unroll loops in load and store helpers.
>
> After splitting helper for LD.B should look something like this:
> (the code is just for demonstration purposes, doublecheck it)
>
> void helper_msa_ld_b(CPUMIPSState *env, uint32_t wd, target_ulong addr)
> {
> wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
> wr_t wx;
> int i;
>
> MEMOP_IDX(DF_BYTE)
> for (i = 0; i < DF_ELEMENTS(DF_BYTE); i++) {
> #if !defined(CONFIG_USER_ONLY)
> wx.b[i] = helper_ret_ldub_mmu(env, addr + (i << DF_BYTE), oi,
> GETPC());
> #else
> wx.b[i] = cpu_ldub_data(env, addr + (i << DF_BYTE));
> #endif
> }
> memcpy(pwd, &wx, sizeof(wr_t));
> }
>
> After unrolling:
>
> void helper_msa_ld_b(CPUMIPSState *env, uint32_t wd, target_ulong addr)
> {
> wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
> wr_t wx;
> int i;
>
> MEMOP_IDX(DF_BYTE)
>
> #if !defined(CONFIG_USER_ONLY)
> wx.b[0] = helper_ret_ldub_mmu(env, addr, oi, GETPC());
> wx.b[1] = helper_ret_ldub_mmu(env, addr + 1, oi, GETPC());
> wx.b[2] = helper_ret_ldub_mmu(env, addr + 2, oi, GETPC());
> wx.b[3] = helper_ret_ldub_mmu(env, addr + 3, oi, GETPC());
> wx.b[4] = helper_ret_ldub_mmu(env, addr + 4, oi, GETPC());
> wx.b[5] = helper_ret_ldub_mmu(env, addr + 5, oi, GETPC());
> wx.b[6] = helper_ret_ldub_mmu(env, addr + 6, oi, GETPC());
> wx.b[7] = helper_ret_ldub_mmu(env, addr + 7, oi, GETPC());
> wx.b[8] = helper_ret_ldub_mmu(env, addr + 8, oi, GETPC());
> wx.b[9] = helper_ret_ldub_mmu(env, addr + 9, oi, GETPC());
> wx.b[10] = helper_ret_ldub_mmu(env, addr + 10, oi, GETPC());
> wx.b[11] = helper_ret_ldub_mmu(env, addr + 11, oi, GETPC());
> wx.b[12] = helper_ret_ldub_mmu(env, addr + 12, oi, GETPC());
> wx.b[13] = helper_ret_ldub_mmu(env, addr + 13, oi, GETPC());
> wx.b[14] = helper_ret_ldub_mmu(env, addr + 14, oi, GETPC());
> wx.b[15] = helper_ret_ldub_mmu(env, addr + 15, oi, GETPC());
> #else
> wx.b[0] = cpu_ldub_data(env, addr);
> wx.b[1] = cpu_ldub_data(env, addr + 1);
> wx.b[2] = cpu_ldub_data(env, addr + 2);
> wx.b[3] = cpu_ldub_data(env, addr + 3);
> wx.b[4] = cpu_ldub_data(env, addr + 4);
> wx.b[5] = cpu_ldub_data(env, addr + 5);
> wx.b[6] = cpu_ldub_data(env, addr + 6);
> wx.b[7] = cpu_ldub_data(env, addr + 7);
> wx.b[8] = cpu_ldub_data(env, addr + 8);
> wx.b[9] = cpu_ldub_data(env, addr + 9);
> wx.b[10] = cpu_ldub_data(env, addr + 10);
> wx.b[11] = cpu_ldub_data(env, addr + 11);
> wx.b[12] = cpu_ldub_data(env, addr + 12);
> wx.b[13] = cpu_ldub_data(env, addr + 13);
> wx.b[14] = cpu_ldub_data(env, addr + 14);
> wx.b[15] = cpu_ldub_data(env, addr + 15);
> #endif
> }
> memcpy(pwd, &wx, sizeof(wr_t));
> }
>
>
> And then apply big-endian-related changes.
>
> I think that would be more in the spirit of other MSA work.
The same approach (splitting helpers/unrolling loops) should be
good for COPY_S, COPY_U, INSERT as well.
Thanks,
Aleksandar
[Qemu-devel] [PATCH v2 3/7] target/mips: Fix copy_u.<b|h|w> for MIPS big endian host, Mateja Marjanovic, 2019/03/25
[Qemu-devel] [PATCH v2 6/7] target/mips: Eliminate unreachable case for MSA instructions insert, Mateja Marjanovic, 2019/03/25
[Qemu-devel] [PATCH v2 7/7] target/mips: Eliminate unreachable case for MSA instructions fill, Mateja Marjanovic, 2019/03/25
Re: [Qemu-devel] [PATCH v2 0/7] target/mips: Add support for MSA instructions on a big endian host, Aleksandar Markovic, 2019/03/25