qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/5] Blackfin: initial port


From: Mike Frysinger
Subject: Re: [Qemu-devel] [PATCH 2/5] Blackfin: initial port
Date: Tue, 25 Jun 2013 19:14:30 -0400
User-agent: KMail/1.13.7 (Linux/3.8.3; KDE/4.6.5; x86_64; ; )

On Tuesday 25 June 2013 17:23:57 Richard Henderson wrote:

whee, got a review! :)  i've snipped items that were obvious in the "i'll go 
do this" category rather than just copying & pasting "OK" many times.  got a 
long flight coming up soon, so hopefully i can tackle the majority of this work 
then.

> > diff --git a/target-bfin/bfin-sim.c b/target-bfin/bfin-sim.c
> 
> Why this separate file from translate.c?

because this port is based on the GNU/sim Blackfin port.  bfin-sim.c focuses on 
the actual opcode translation while the higher level file (translate.c in QEMU 
and interp.c in GNU/sim) takes care of the higher layers (like clock ticking).  
i like keeping the core structure the same between the two sims so that i can 
more easily merge changes between them.

> > +#include <stdbool.h>
> > +#include <stdint.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <inttypes.h>
> 
> Certainly you shouldn't need these, since this isn't a separately
> compiled object -- you're included from translate.c.

yes, this is most likely true.  it's due to the previous reason.  maybe i 
should make it a sep compiled file then there won't be any confusion ...

> > +#define HOST_LONG_WORD_SIZE (sizeof(long) * 8)
> 
> You mean TCG_TARGET_REG_BITS?

maybe?  this is for extracting target encoded immediates and properly 
extending them up to the system that is running the code (e.g. x86_64) so that 
it can then be checked naturally (like comparing it to a -1).  but this is 
done in the decode logic, not in the tcg output logic, so i'm not sure TCG is 
the right abstraction.

> > +static TCGv
> > +get_allreg(DisasContext *dc, int grp, int reg)
> > +{
> > +    TCGv *ret = cpu_regs[(grp << 3) | reg];
> > +    if (ret) {
> > +       return *ret;
> > +    }
> > +    abort();
> > +    illegal_instruction(dc);
> > +}
> 
> Well, which is it?  abort or illegal_instruction.

i had this in there while doing the initial port to track down bad things.  i 
can probably cut it over to illegal_instruction() now.

> And come to that, how is
> abort any better than SEGV from dereferecing the null?  Certainly the later
> will generate a faster translator...

QEMU doesn't need any help segfaulting :p.  an abort() is much better at 
showing the source of the problem.

> > +decode_multfunc_tl(DisasContext *dc, int h0, int h1, int src0, int src1,
> > +                   int mmod, int MM, TCGv psat)
> > +{
> > +    TCGv s0, s1, val;
> > +
> > +    s0 = tcg_temp_local_new();
> 
> You'll really want to avoid local temps and branches, if at all possible. 
> For some of the more complex stuff that you're open-coding, you may be
> better off with helper functions instead.

it seemed like having generated (and cached) opcodes was better than relying 
on helpers since helpers requires interrupting the native code flow and muck 
around with state ?  is there a good (or even semi-decent) rule of thumb i can 
use to decide when to use one over the other ?

> > +static void
> > +saturate_s32(TCGv_i64 val, TCGv overflow)
> 
> I shall now stop mentioning movcond.  I sense there are many locations to
> come.

looks like movcond was introduced after i did the initial (bulk) port.  so 
there are probably many locations that can take advantage of it.  i'll have to 
go through the code top-to-bottom looking for things.  and probably look at 
the history of tcg/README to see what other interesting opcodes have been 
added since.

> > +    } else if (prgfunc == 11 && poprnd < 6) {
> > +        /* TESTSET (Preg{poprnd}); */
> > +        TCGv tmp = tcg_temp_new();
> > +        tcg_gen_qemu_ld8u(tmp, cpu_preg[poprnd], dc->mem_idx);
> > +        tcg_gen_setcondi_tl(TCG_COND_EQ, cpu_cc, tmp, 0);
> > +        tcg_gen_ori_tl(tmp, tmp, 0x80);
> > +        tcg_gen_qemu_st8(tmp, cpu_preg[poprnd], dc->mem_idx);
> > +        tcg_temp_free(tmp);
> 
> I'll note that this is fine for system code, but for user code ought to be
> atomic.  There are a bunch of really bad examples in the tree, and no real
> good solutions atm.

i general, i completely agree with you.  in practice, i think a (mis)feature 
of the Blackfin arch helps out here.  TESTSET doesn't work on cached memory, 
and it only works on system memory (i.e. not L1), and every Linux build runs 
with caches turned on.  so maybe flaky misbehavior is a good thing ? :)

i can drop a note in there noting the issue though

> > +    /* Everything here needs to be aligned, so check once */
> > +    gen_align_check(dc, cpu_spreg, 4, false);
> 
> You ought not need to generate explicit alignment checks.  Yes, we don't do
> that correctly for user-mode, but we do for system mode.
> 
> My hope is that user mode eventually has the option of using the system
> mode page tables too -- there are just too many things that don't work
> correctly when host and target page sizes don't match, or the host and
> target don't have the same unaligned access characteristics.

i had this code because it wasn't working in user mode, and someone in a 
previous review suggested i check out gen_helper_memalign().  i actually have 
this disabled by default because the speed impact is fairly substantial.  i 
was debating turning this into a configure time check.

> > +        } else if (grp == 4 && (reg == 0 || reg == 2)) {
> > +            /* Push A#.X */
> > +            tmp64 = tcg_temp_new_i64();
> > +            tcg_gen_shri_i64(tmp64, cpu_areg[reg >> 1], 32);
> > +            tmp = tcg_temp_new();
> > +            tcg_gen_trunc_i64_i32(tmp, tmp64);
> > +            tcg_temp_free_i64(tmp64);
> > +            tcg_gen_andi_tl(tmp, tmp, 0xff);
> 
> Do we ever allow the high 24 bits to be non-zero?  Is this andi actually
> redundant?

yes & no.  in the hardware, it'll always be 0.  there is some code which will 
might sign extend things (so that you can correctly compare the 40bit 
accumulator using 64bit regs), and that would interact badly here.  maybe it 
would be better to drop this and force the accumulator handling code to do the 
right thing with the sign rather than store the result.

> > +    if (W == 1) {
> > +        /* [--SP] = ({d}R7:imm{dr}, {p}P5:imm{pr}); */
> > +        if (d) {
> > +            for (i = dr; i < 8; i++) {
> > +                tcg_gen_subi_tl(cpu_spreg, cpu_spreg, 4);
> > +                tcg_gen_qemu_st32(cpu_dreg[i], cpu_spreg, dc->mem_idx);
> > +            }
> > +        }
> 
> What's the cpu exception effect of the second store causing a page fault?
> Normally one needs to do the address increment in a temporary and only
> update the real SP register at the end, so that the instruction can be
> restarted.

this was me being lazy when getting started with the port ;).  the GNU/sim 
code matches the hardware and that is as you suspected -- the SP reg isn't 
updated until after all the push/pops finish.  i wasn't sweating the difference 
here too much as in user mode, the exception would just kill the program 
(ignoring apps that catch like SIGSEGV).

i'll have to generate a tmp reg based on SP and do it correctly though now 
that this port is out of the infant stage.

> > +#include "linux-fixed-code.h"
> > +
> > +static uint32_t bfin_lduw_code(DisasContext *dc, target_ulong pc)
> > +{
> > +#ifdef CONFIG_USER_ONLY
> > +    /* Intercept jump to the magic kernel page */
> > +    if (((dc->env->personality & 0xff/*PER_MASK*/) == 0/*PER_LINUX*/) &&
> > +        (pc & 0xFFFFFF00) == 0x400) {
> > +        uint32_t off = pc - 0x400;
> > +        if (off < sizeof(bfin_linux_fixed_code)) {
> > +            return ((uint16_t)bfin_linux_fixed_code[off + 1] << 8) |
> > +                   bfin_linux_fixed_code[off];
> > +        }
> > +    }
> > +#endif
> 
> Surely this memory setup belongs in linux-user/.

i couldn't find good examples previously to make this work so i put it in here. 
 
i came across the arm handling of its fixed code recently though (which i think 
does it in linux-user), so i'll see about moving it there.

> > +/* Interpret a single Blackfin insn; breaks up parallel insns */
> > +static void
> > +interp_insn_bfin(DisasContext *dc)
> > +{
> > +    _interp_insn_bfin(dc, dc->pc);
> 
> I'd prefer a suffix like "1" rather than a prefix of "_".

this matches the GNU/sim code, so i'd prefer to stick to that where possible.  
underscore prefix is a common "this is internal" indicator.

> > +static inline void cpu_get_tb_cpu_state(CPUArchState *env, target_ulong
> > *pc, +                                        target_ulong *cs_base, int
> > *flags) +{
> > +    *pc = cpu_get_pc(env);
> > +    *cs_base = 0;
> > +    *flags = env->astat[ASTAT_RND_MOD];
> > +}
> 
> You'll probably be better off with a bit that notes whether the loop
> registers are active, or something, so that you don't have to always
> generate code that handles them.

whether loop registers are active is purely based on the PC and the current 
values in the loop registers/counters.  the hardware logic is basically:
        # After every single insn is executed.
        oldpc = PC
        PC += insn_length;
        if (oldpc == LB1 && LC1) {
                PC = LT1;
                --LC1;
        }
        if (oldpc == LB0 && LB0) {
                PC = LT0;
                --LC0;
        }

i didn't think it was really possible to check the cpu state at runtime since 
ideally you'd generate a bunch of TB's, cache them, and then let them run 
w/out invoking the translator again.

> > +DEF_HELPER_3(raise_exception, void, env, i32, i32)
> 
> Lots of these can use better settings for flags.  Here, the only side
> effect is to raise an exception, which leads to reading the globals.  So
> TCG_CALL_NO_WG.

the flags are newer than the port, so i'll have to go through them all for a 
refresh

> > +static void gen_goto_tb(DisasContext *dc, int tb_num, TCGv dest)
> > +{
> > +/*
> > +    TranslationBlock *tb;
> > +    tb = dc->tb;
> > +
> > +    if ((tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK)) {
> > +        tcg_gen_goto_tb(tb_num);
> > +        tcg_gen_mov_tl(cpu_pc, dest);
> > +        tcg_gen_exit_tb((long)tb + tb_num);
> > +    } else */{
> > +        gen_astat_update(dc, false);
> > +        tcg_gen_mov_tl(cpu_pc, dest);
> > +        tcg_gen_exit_tb(0);
> > +    }
> 
> Why the astat update here, when you have it on almost no other exits from
> the tb?

it's really the only way i could get it to [mostly] work :/.  i tried to figure 
out how x86 was handling its delayed eflags updates (since Blackfin will do 
pretty much the same exact thing for the same reason), but i failed at that 
too.

in general, the TB logic is magic to me.  there are a number of optimizations 
that are blocked because i haven't been able to figure it out.  i'm surprised 
it really works at all.  you can see how my gen_goto_tb always does 
tct_gen_exit_tb(0) which forces a PC look up every time.
-mike

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]