qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/6] AREG0 patches v4


From: Aurelien Jarno
Subject: Re: [Qemu-devel] [PATCH 0/6] AREG0 patches v4
Date: Tue, 10 Jan 2012 00:04:13 +0100
User-agent: Mutt/1.5.20 (2009-06-14)

On Sun, Jan 08, 2012 at 12:27:39PM +0000, Blue Swirl wrote:
> On Sun, Jan 8, 2012 at 00:52, Aurelien Jarno <address@hidden> wrote:
> > On Sat, Jan 07, 2012 at 10:24:09PM +0000, Blue Swirl wrote:
> >> In this version, I made basic AREG0 free load/store implementations
> >> for all targets. Only x86-64 is tested, others have probably problems,
> >> especially 64 bit guest (Sparc64 in this case) on 32 bit hosts.
> >>
> >> I think this should be committed as a starting point if there are no
> >> major objections.
> >>
> >
> > If it is known to be broken on 32-bit hosts, I am not sure it's really
> > a good idea to commit these patches. It's just going to prevent all
> > people developing on x86 to continue testing or developing QEMU.
> 
> No, the brokenness is limited to 32 bit hosts with 64 bit AREG0 free
> targets, which currently means Sparc64 on i386, Sparc64 on ARM etc.
> Sparc32 should run unless there are other problems. Other targets
> which are not AREG0 free (like x86) should be unchanged on any host.
> 
> I'm not familiar with various host ABIs, fixing all of the targets is too 
> much.

Yes, but I guess your goal is to have more qemu targets than only
sparc, so it's going to become important at some point. I think it 
should not be committed until we have working x86 and x86-64 host
support.

BTW, right now I have just got a quick overview of the patch series, I
have mostly tested them. I am planning to review them more in details in
the next days.


> > Also I have to say I still don't get the goal of these patch series. I
> > have just tried it on an x86-64 guest with a 32-bit sparc guest. It
> > works well, but the generated TB are slightly bigger. I have done a
> > quick and dirty performance test by compiling some code inside the
> > guests, and I get a slow down of about 0.6%, though for this kind of
> > performance changes, it should probably done with other methods.
> 
> Thank you for testing. I also made some tests earlier which were near
> that figure but there performance increased. But my test setup adds a
> lot of noise (CPU throttling etc.) so I didn't trust the figures much.
> 
> The slow down could come from changed compiler flags. IIRC looking at
> disassembled functions, stack protector kicks in.

Given all the translation blocks are bigger (there is one more mov
instruction per memory load/store and also for some helper), I don't
think it really work it at compiler flags or things like that. The code
to execute is longer, which also has side effects, like not playing very
well wrt instructions cache, or even qemu TB cache.

That said 0.8% is not that big, we can probably accept that, if we have
a good reason for that. And here come the next point...


> > Is there another goal behind this patch series beside performance?
> 
> Yes, we eliminate the global register variable, which has given us a
> lot of trouble. Those are not supported by for example LLVM or sparse.
> The semantics of calling code which has not been compiled with
> register variables from code which has been compiled so is not
> defined, but it happens to work most of the time by saving the
> variable by hand. On Sparc hosts, global register handling is a mess.

Even if I personally don't really have this use case, I understand it's
something that might be important for others. 

My main concern there is how to reach this goal for all hosts and all
targets in a reasonable time frame. I remember the difficult moments 
with both dyngen and TCG, with both softfloat-native and softfloat or 
even now with qdev and non-qdev devices.

That's why I think we should start with at least x86 and x86-64 on the
host side. On the target side, we should have at least two targets
converted, and if possible with different word size and different 
endianness. That way maintainers of the TCG host code can tests all the
possible cases before committing it (experience has shown that it's 
difficult to have all cases right at the first time). MIPS can be a good
candidate, as it can do both 32 and 64 bits and both big and little 
endian.

Ideally at least all hosts should be converted before the next release,
so we don't ship QEMU with support for half of the hosts only.

Do you have any comments or plans about that? I guess it's actually the
critical part in the changes you would like to see.


> After the change has been made, all QEMU code is just plain C. With
> TCI it should even be portable to any host.

Isn't that already the case with TCI? As far as I know TCI doesn't use
host registers, so it should already be fine (beside the doc says it's
not really tested in cross-endianness mode).


> When AREG0 is only used internally to TCG generated code, we could
> even put the CPUState structure to stack and use stack pointer to
> access them, freeing a global variable there too.

That might be a good idea on architectures with very few registers like
i386, not sure it'll have a big impact on others.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
address@hidden                 http://www.aurel32.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]