qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] qemu vs gcc4


From: Rob Landley
Subject: Re: [Qemu-devel] qemu vs gcc4
Date: Tue, 31 Oct 2006 20:51:30 -0500
User-agent: KMail/1.9.1

On Tuesday 31 October 2006 7:29 pm, Paul Brook wrote:
> > Actually it sounds additive rather than multiplicative.  Does each target
> > have an entirely unrelated set of ops, or is there a shared set of
> > primitive ops plus some oddballs?
> 
> The shared set of primitive ops is basically qops :-)
> You probably could figure out a single common qet of qops, then write 
assembly 
> and glue them together like we do with dyngen. However once you've done that 
> you've implemented most of what's needed for fully dynamic qops, so it 
> doesn't really seem worth it.

I missed a curve.  What's "fully dynamic qops"?  (There's no translation 
cache?)

> > But I already know 
> > that doesn't work because it doesn't explain the "unable to find spill
> > register" problem. 
> 
> That a separate gcc bug. It gets stuck when you tell it not to use half the 
> registers, then ask it to do 64-bit math. This is one of the reasons 
> eliminating the fixed registers is a good idea.

Sigh.  The problems motivating me to learn the code are highly esoteric 
breakage, yet I'm still not quite up to the task of understanding what's 
going on when all this works _right_.  Grumble... 

> > > It corresponds to "T0" in dyngen. In addition to the actual CPU state,
> > > dyngen
> > > uses 3 fixed register as scratch workspace. for qop purposes these are
> > > part of the guest CPU state. They're only there to aid conversion of the
> > > translation code, they'll go away eventually.
> >
> > Presumably the m68k target is pure qop, and hasn't got this sort of thing?
> 
> Correct.
> There is one use of T0 left for communicating with the TB chaining code, but 
> that's it and will probably go away eventually.

Any idea where I can get a toolchain that can output a "hello world" program 
for m68k nommu?  (Or perhaps you have a statically linked "hello world" 
program for the platform lying around?)

Building toolchains is one of my other hobbies but it's a royal pain because 
in order to get "hello world" to compile and link you have to supply kernel 
headers, build binutils and gcc with various configuration options and path 
overrides and such, build uClibc with the result and get them all talking to 
each other.  I.E. you've got to do hours of work before you get to the first 
real "did it work" point, and then backtrack to figure out why the answer is 
usually "no".  (Prebuilt binary toolchains are useful just to narrow down the 
number of possible things that could be broken when you first try out a new 
platform.)

> > > > Possible translation: you can feed a qreg containing an I64 value to a
> > > > qop taking an i32 argument, and it'll typecast the sucker down
> > > > intelligently, but if you produce an I32 result and expect to use that
> > > > qreg's value as an I64 argument later, you have to call a
> > > > sign-extending qop on it first?
> > >
> > > Exactly.
> > > If you mix I32,F32 and/or F64 in this way Bad Things will happen.
> >
> > Presumably just the same kinds of Bad Things as "float f; *(int *)&f;"?
> 
> Or qemu will get confused and crash.

I've had that happen without qops, although not recently.  (I have this nasty 
habit of trying Ubuntu's PPC and x86-64 distros under qemu with each new 
release.  They usually fail in amusing new ways.)

> > > > seeing end with _im which I presume means "immediate".  The 
alternative
> > > > is _cc, but what does that mean?  (Presumably not "closed captioned".)
> > >
> > > _cc are variants that set the condition codes. I may have got T0 and T1
> > > backwards in the first 3 lines.
> >
> > Ah!
> >
> > Is this written down anywhere?  I've read Fabrice's paper and the design
> > documentation, and I'm not remembering this.  It's quite possible I missed
> > it when my brain filled up, though.
> 
> Dunno.

So if at any point I actually understand this stuff, I need to write 
documentation?  (I can do part 2, part 1 the jury's still out on...)

> It also means you don't need to reserve that register, avoiding the gcc
> unable to find spill register bug you mentioned above.

I'm all for it.

> > Um, wouldn't the flag setting code be fairly straightforward as a qop that
> > comes right _before_ the other op, as in "set the flags for doing this 
with
> > these registers", that does nothing but set the flags (I.E. it wouldn't
> > modify the contents of any the registers, so it could be immediately
> > followed by the appropriate add or shift or so on), and then the flag
> > setting pass could just turn all the ones that weren't needed into
> > QOP_NULL?
> 
> Theoretically possible, but not so easy in practice. Especially when you get 
> things like partial flag clobbers, and lazy flag evaluation. Doing it as a 
> target specific hack is much simpler and quicker.

I think I know what partial flag clobbers are (although if you're working your 
way back, in theory you could handle it with a mask of exposed bits), but 
what's lazy flag evaulation?  (I thought that was the point of eliminating 
the unused flag setting.  Are you saying the hardware also does this and we 
have to emulate that?)

> > Or is that what's happening now?  (Do QOPs ever modify their input
> > registers, or only the output one?)
> 
> The generic qops never modify inputs, and never read outputs. Inputs and 
> outputs can be the same qreg.

Hm.

> > > There are three fairly independent stages:
> > > 1) target-*/translate.c converts guest code into qops.
> > > 2) translate-all.c messes about with those qops a bit (allocates host
> > > registers, etc).
> > > 3) translate-op.c,translate-qop.c and target-*/ turns those qops into
> > > host code.
> >
> > Is pass 2 where the flag elimination pass goes (and presumably any other
> > optimizations that might get added)?  No, that can't be the case or the
> > m68k code wouldn't need its own implementation of the flag elimination
> > pass...
> 
> Flag elimination is at the end of step 1.

Because it's platform specific?
\
> > > qops and dyngen ops are both small "functions" that are represented in a
> > > similar way. The difference is that dyngen ops are target specific fixed
> > > functions, whereas qops are generic parameterized functions.
> >
> > So the 11x11 exponential complexity of qemu producing its own assembly
> > output might not be as much of a problem after switching to qops?
> 
> RIght. The exponential complexity is if you write the assembly by hand
> instead of using gcc to generate it.

The exponential complexity is if you have to write different code for each 
combination of host and target.  If every target disassembles to the same set 
of target QOPs, then you could have a hand-written assembly version of each 
QOP for each host platform, and still have N rather than N^2 of them.

And I still wanna use tcc to generate it, someday. :)

> > Possibly some of the common qops can have an asm block for 'em, and the
> > rest can go through the contortions target-*/op.c is currently doing with
> > (glue(glue(blah))) and so on.
> 
> Currently we know how to generate code direcly for all qops. Anything more 
> complicated must be either put in a helper function or split into multiple 
> qops.

Split into multiple qops I can understand.

> > > I started off by saying qops were effectively instructions for an
> > > imaginary machine. translate-all.c rearranges them so they match up very
> > > closely with the instructions available on the host. Once this has been
> > > done turning them into binary code is relatively simple.
> >
> > I sort of thought this is what it was already doing, but apparently not...
> 
> We're getting confused with tenses. I mean this once translate-all.c has 
> rearranged the qops we *do* generate host instructions from them without too 
> much effort.

By "already doing" I meant I thought the 0.8.2 code was dong this, before your 
new tree switching everything over to qops.  (Trying to read dyngen.c reminds 
me of reading cgi code that outputs html with embedded javascript.)

Rob
-- 
"Perfection is reached, not when there is no longer anything to add, but
when there is no longer anything to take away." - Antoine de Saint-Exupery




reply via email to

[Prev in Thread] Current Thread [Next in Thread]