Re: [Qemu-devel] qemu vs gcc4

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] qemu vs gcc4

From:	Paul Brook
Subject:	Re: [Qemu-devel] qemu vs gcc4
Date:	Wed, 1 Nov 2006 00:29:28 +0000
User-agent:	KMail/1.9.5
> Actually it sounds additive rather than multiplicative.  Does each target
> have an entirely unrelated set of ops, or is there a shared set of
> primitive ops plus some oddballs?

The shared set of primitive ops is basically qops :-)
You probably could figure out a single common qet of qops, then write assembly 
and glue them together like we do with dyngen. However once you've done that 
you've implemented most of what's needed for fully dynamic qops, so it 
doesn't really seem worth it.

> But backing up and just accepting that for a moment, in theory what you
> need is some way to compile a C function to machine code, and then unwrap
> that function into a .raw file containing just the machine code.  So the
> only per-compiler thing would be this unwrapper thingy.  

Right.

> But I already know 
> that doesn't work because it doesn't explain the "unable to find spill
> register" problem. 

That a separate gcc bug. It gets stuck when you tell it not to use half the 
registers, then ask it to do 64-bit math. This is one of the reasons 
eliminating the fixed registers is a good idea.

> > It corresponds to "T0" in dyngen. In addition to the actual CPU state,
> > dyngen
> > uses 3 fixed register as scratch workspace. for qop purposes these are
> > part of the guest CPU state. They're only there to aid conversion of the
> > translation code, they'll go away eventually.
>
> Presumably the m68k target is pure qop, and hasn't got this sort of thing?

Correct.
There is one use of T0 left for communicating with the TB chaining code, but 
that's it and will probably go away eventually.

> > > Or the value currently in a qreg has a type associated with it, but
> > > the next value stored in that qreg may have a different type?
> >
> > A qreg has a fixed type. The value stored in that qreg has that type. To
> > convert it to a different type you need to use an explicit conversion
> > qop.
>
> So values don't have types, the qregs the values are _in_ have types.  But
> I thought there were an unlimited number of them (well, 1024 or so), and
> they're dynamically allocated (at least some of the time).  How does it
> keep track of the type of a given qreg?  (When you convert, you copy values
> from one qreg into another?)

Yes. Conversion is just like any other qop. It reads one qreg, and writes the 
result to a different qreg which happens to be a different type.

> > > Possible translation: you can feed a qreg containing an I64 value to a
> > > qop taking an i32 argument, and it'll typecast the sucker down
> > > intelligently, but if you produce an I32 result and expect to use that
> > > qreg's value as an I64 argument later, you have to call a
> > > sign-extending qop on it first?
> >
> > Exactly.
> > If you mix I32,F32 and/or F64 in this way Bad Things will happen.
>
> Presumably just the same kinds of Bad Things as "float f; *(int *)&f;"?

Or qemu will get confused and crash.

> > > seeing end with _im which I presume means "immediate".  The alternative
> > > is _cc, but what does that mean?  (Presumably not "closed captioned".)
> >
> > _cc are variants that set the condition codes. I may have got T0 and T1
> > backwards in the first 3 lines.
>
> Ah!
>
> Is this written down anywhere?  I've read Fabrice's paper and the design
> documentation, and I'm not remembering this.  It's quite possible I missed
> it when my brain filled up, though.

Dunno.

> > > Um, is my earlier characterization of "unwrapping stuff" at all close?
> >
> > Not entirely. I'm also replacing fixed locations (T2) with dynamicall
> > allocated qregs.
>
> The dynamic allocation buys you what?  (Less spilling?)

More-or-less. It makes it easier to optimize. The code generator can pick what 
to put in registers, or even not put them there at all, instead of having to 
do things exactly how you told it.

It also means you don't need to reserve that register, avoiding the gcc unable 
to find spill register bug you mentioned above.

> > Most x86 instructions set the condition code flags. However most of the
> > time these flags are ignored. eg. if you have to consecutive add
> > instructions the first will set the flags, and the second will
> > immediately overwrite them.
> >
> > qemu contains a back-propagation pass that will remove the code to set
> > the flags after the first instruction. Currently this is implemented by
> > changing an addl_cc op into a plain addl op.
>
> I actually understood that.  Yay!
>
> > The flag-setting code would most likely require several qops to
> > implement, so
> > it would be much harder to prove it is not needed and remove it. So there
> > is a mechanism for adding extra target qops, doing the flag elimination
> > pass, then expanding those to generic qops.
>
> Um, wouldn't the flag setting code be fairly straightforward as a qop that
> comes right _before_ the other op, as in "set the flags for doing this with
> these registers", that does nothing but set the flags (I.E. it wouldn't
> modify the contents of any the registers, so it could be immediately
> followed by the appropriate add or shift or so on), and then the flag
> setting pass could just turn all the ones that weren't needed into
> QOP_NULL?

Theoretically possible, but not so easy in practice. Especially when you get 
things like partial flag clobbers, and lazy flag evaluation. Doing it as a 
target specific hack is much simpler and quicker.

> Or is that what's happening now?  (Do QOPs ever modify their input
> registers, or only the output one?)

The generic qops never modify inputs, and never read outputs. Inputs and 
outputs can be the same qreg.

> > > Ah, hang on.  There's target_reginfo in translate-all.c, that's using
> > > some of the other values.  So what the heck does translate-all.c do? 
> > > (Shared code called by all the platform-dependent translate functions?)
> >
> > There are three fairly independent stages:
> > 1) target-*/translate.c converts guest code into qops.
> > 2) translate-all.c messes about with those qops a bit (allocates host
> > registers, etc).
> > 3) translate-op.c,translate-qop.c and target-*/ turns those qops into
> > host code.
>
> Is pass 2 where the flag elimination pass goes (and presumably any other
> optimizations that might get added)?  No, that can't be the case or the
> m68k code wouldn't need its own implementation of the flag elimination
> pass...

Flag elimination is at the end of step 1.

> > > > For converting targets you can probably ignore most of the
> > > > translate-all and host-*/ changes. These implement generating code
> > > > from the qops.
> > >
> > > Ok, this implies that qops are a new thing.  Which looking at the code
> > > sort
> > > of supports.  Which means I don't understand what's going on at all.
> >
> > qops and dyngen ops are both small "functions" that are represented in a
> > similar way. The difference is that dyngen ops are target specific fixed
> > functions, whereas qops are generic parameterized functions.
>
> So the 11x11 exponential complexity of qemu producing its own assembly
> output might not be as much of a problem after switching to qops?

RIght. The exponential complexity is if you write the assembly by hand instead 
of using gcc to generate it.

> Possibly some of the common qops can have an asm block for 'em, and the
> rest can go through the contortions target-*/op.c is currently doing with
> (glue(glue(blah))) and so on.

Currently we know how to generate code direcly for all qops. Anything more 
complicated must be either put in a helper function or split into multiple 
qops.

> > While they are really separate things, the details have been chosen so it
> > should be possible to adapt the existing translate.c code rather than
> > having to rewrite it from scratch. Decoding x86 instruction semantics is
> > complicated :-)
>
> Yay iterative transformation with regression testing.  (And nothing says
> regression testing like booting a Linux distro under the sucker.)

Exactly.

> > Many of the simpler dyngen ops can be replaced with a single qop. Others
> > can be replaces with a sequence of a few qops. Some of the more
> > complicated ones may need to be moved into helper functions.
>
> At some point, I hope to understand helper functions.  But I'm not there
> yet.
>
> > > I need to re-read this later.  My brain's full and I'm deeply confused.
> >
> > I started off by saying qops were effectively instructions for an
> > imaginary machine. translate-all.c rearranges them so they match up very
> > closely with the instructions available on the host. Once this has been
> > done turning them into binary code is relatively simple.
>
> I sort of thought this is what it was already doing, but apparently not...

We're getting confused with tenses. I mean this once translate-all.c has 
rearranged the qops we *do* generate host instructions from them without too 
much effort.

> > If native host FP is not available qemu will include appropriate bits so
> > that
> > after macro expansion and inlining you end up with:
> >
> >   tmp = gen_new_qreg(QMODE_I32);
> >   gen_op_helper(HELPER_addf32, tmp, QREG_FOO, QREG_BAR).
> >
> > and the addf32 helper does the floating point addition using the
> > "softfloat" library. The qemu softfloat library implementation may
> > actually use hardware floating point rather than doing everything
> > manually.
>
> No reason (except speed) the code output into a translation block can't do
> function calls.  I think.

That's exactly what a helper function is. Calling functions is complicated, so 
I've restricted the functions that can be called to explicitly declared 
helper functions.

Paul
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Qemu-devel] qemu vs gcc4, (continued)
Prev by Date: Re: [Qemu-devel] qemu vs gcc4
Next by Date: [Qemu-devel] qemu vl.c
Previous by thread: Re: [Qemu-devel] qemu vs gcc4
Next by thread: Re: [Qemu-devel] qemu vs gcc4
Index(es):
- Date
- Thread