qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 00/11] AREG0 elimination


From: Blue Swirl
Subject: Re: [Qemu-devel] [PATCH RFC 00/11] AREG0 elimination
Date: Sun, 15 May 2011 17:10:54 +0300

On Sun, May 15, 2011 at 5:06 PM, Aurelien Jarno <address@hidden> wrote:
> On Sun, May 15, 2011 at 05:02:37PM +0300, Blue Swirl wrote:
>> On Sun, May 15, 2011 at 4:43 PM, Aurelien Jarno <address@hidden> wrote:
>> > On Sun, May 15, 2011 at 04:16:20PM +0300, Blue Swirl wrote:
>> >> On Sun, May 15, 2011 at 3:49 PM, Aurelien Jarno <address@hidden> wrote:
>> >> > On Sun, May 15, 2011 at 03:30:17PM +0300, Blue Swirl wrote:
>> >> >> On Sun, May 15, 2011 at 2:36 PM, Aurelien Jarno <address@hidden> wrote:
>> >> >> > On Sun, May 15, 2011 at 02:12:27PM +0300, Blue Swirl wrote:
>> >> >> >> On Sun, May 15, 2011 at 1:19 PM, Aurelien Jarno <address@hidden> 
>> >> >> >> wrote:
>> >> >> >> > I still don't get where are this list of possible changes? Did I 
>> >> >> >> > miss
>> >> >> >> > something in another thread?
>> >> >> >>
>> >> >> >> I'm referring to the patches I sent.
>> >> >> >
>> >> >> > Ok, patches 1, 2 and 4 to 7 looks ok at a first glance, though I 
>> >> >> > think
>> >> >> > patches 6 and 7 should be done for all hosts or none of them.
>> >> >>
>> >> >> The changes can be done in steps, but of course removing temp_buf from
>> >> >> CPUState would need all targets to be converted first.
>> >> >>
>> >> >> >> > On the TCG generated code, the env structure is used almost for 
>> >> >> >> > every
>> >> >> >> > op, so it really makes sense to keep it in a register instead of 
>> >> >> >> > having to
>> >> >> >> > reload the address of env regularly from memory. Given it only 
>> >> >> >> > affects
>> >> >> >> > TCG generated code, I don't see the point of portability here.
>> >> >> >>
>> >> >> >> For example, maybe the bugs in Sparc glibc could be avoided by using
>> >> >> >> one of %i set of registers (not accessible from helpers) for AREG0
>> >> >> >> within generated code instead of %g registers which seem to be
>> >> >> >> fragile.
>> >> >> >
>> >> >> > First of all, but it's a different subject, I am not sure there are
>> >> >> > sparc glibc bugs, I'd rather says QEMU mis-uses some register. For
>> >> >> > example the following code is probably wrong:
>> >> >> >
>> >> >> > /* Note: must be synced with dyngen-exec.h */
>> >> >> > #ifdef CONFIG_SOLARIS
>> >> >> > #define TCG_AREG0 TCG_REG_G2
>> >> >> > #elif defined(__sparc_v9__)
>> >> >> > #define TCG_AREG0 TCG_REG_G5
>> >> >> > #else
>> >> >> > #define TCG_AREG0 TCG_REG_G6
>> >> >> > #endif
>> >> >> >
>> >> >> > __sparc_v9__ can set on the 32-bit ABI, when the compiler targets 
>> >> >> > V8+,
>> >> >> > so the condition is probably wrong there. Secondly the SPARC ABI [1] 
>> >> >> > on
>> >> >> > page 23 that %g5 to %g7 are reserved for system. I don't think QEMU 
>> >> >> > has
>> >> >> > the right to use this registers.
>> >> >>
>> >> >> Yes, but the situation is not so nice. Please see this post for status
>> >> >> as of 2010:
>> >> >> http://article.gmane.org/gmane.comp.emulators.qemu/63610
>> >> >>
>> >> >> This is from Debian glibc 2.11.2-10:
>> >> >> $ file /lib/libc-2.11.2.so
>> >> >> /lib/libc-2.11.2.so: ELF 32-bit MSB shared object, SPARC32PLUS,
>> >> >> version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux
>> >> >> 2.6.18, stripped
>> >> >> $ objdump -d /lib/libc.so.6 |grep %g1|wc -l
>> >> >> 69648
>> >> >> $ objdump -d /lib/libc.so.6 |grep %g2|wc -l
>> >> >> 37299
>> >> >> $ objdump -d /lib/libc.so.6 |grep %g3|wc -l
>> >> >> 20635
>> >> >> $ objdump -d /lib/libc.so.6 |grep %g4|wc -l
>> >> >> 11603
>> >> >> $ objdump -d /lib/libc.so.6 |grep %g5|wc -l
>> >> >> 448
>> >> >> $ objdump -d /lib/libc.so.6 |grep %g6|wc -l
>> >> >> 150
>> >> >> $ objdump -d /lib/libc.so.6 |grep %g7|wc -l
>> >> >> 3052
>> >> >>
>> >> >> Glibc is compiled for Sparc32plus, so it should only use %g6 and %g7,
>> >> >
>> >> > From the calling convention point of view, sparc32 and sparc32plus are
>> >> > the same ABI, so %g5 is also reserved for system use.
>> >> >
>> >> >> or %g1 and %g5 for scratch purposes. However, it is the application
>> >> >> registers %g2 to %g4 that are used heaviest. Looking inside the
>> >> >> objdump it's easy to see that the uses are not for example saving or
>> >> >> restoring, but actually using them without saving the previous value
>> >> >> first:
>> >> >
>> >> > Well, we have to define system and application. System is defined as
>> >> > library in Chapter 6, and I don't see the libc there, and is probably
>> >> > considered as part of the application.
>> >>
>> >> No, for example unistd.h is described and even X11. GCC also says that
>> >> libraries should be compiled without using the registers:
>> >>
>> >> http://gcc.gnu.org/onlinedocs/gcc-4.6.0/gcc/SPARC-Options.html#SPARC-Options
>> >>
>> >> >> 000211e0 <__divdi3>:
>> >> >>    211e0:       9d e3 bf a0     save  %sp, -96, %sp
>> >> >>    211e4:       90 10 00 18     mov  %i0, %o0
>> >> >>    211e8:       92 10 00 19     mov  %i1, %o1
>> >> >>    211ec:       94 10 00 1a     mov  %i2, %o2
>> >> >>    211f0:       96 10 00 1b     mov  %i3, %o3
>> >> >>    211f4:       80 a6 20 00     cmp  %i0, 0
>> >> >>    211f8:       06 40 00 10     bl,pn   %icc, 21238 <__divdi3+0x58>
>> >> >>    211fc:       a0 10 20 00     clr  %l0
>> >> >>    21200:       80 a2 a0 00     cmp  %o2, 0
>> >> >>    21204:       26 40 00 13     bl,a,pn   %icc, 21250 <__divdi3+0x70>
>> >> >>    21208:       a0 38 00 10     xnor  %g0, %l0, %l0
>> >> >>    2120c:       7f ff fe ed     call  20dc0 <__ashldi3+0x40>
>> >> >>    21210:       98 10 20 00     clr  %o4
>> >> >>    21214:       84 10 00 08     mov  %o0, %g2
>> >> >>
>> >> >> ...whoops...
>> >> >>
>> >> >>    21218:       80 a4 20 00     cmp  %l0, 0
>> >> >>    2121c:       02 40 00 04     be,pn   %icc, 2122c <__divdi3+0x4c>
>> >> >>    21220:       86 10 00 09     mov  %o1, %g3
>> >> >>
>> >> >> ...whoops...
>> >> >>
>> >> >>    21224:       86 a0 00 09     subcc  %g0, %o1, %g3
>> >> >>    21228:       84 60 00 02     subc  %g0, %g2, %g2
>> >> >>    2122c:       b2 10 00 03     mov  %g3, %i1
>> >> >>    21230:       81 cf e0 08     rett  %i7 + 8
>> >> >>    21234:       90 10 00 02     mov  %g2, %o0
>> >> >>    21238:       92 a0 00 19     subcc  %g0, %i1, %o1
>> >> >>    2123c:       90 60 00 18     subc  %g0, %i0, %o0
>> >> >>    21240:       80 a2 a0 00     cmp  %o2, 0
>> >> >>    21244:       16 4f ff f2     bge  %icc, 2120c <__divdi3+0x2c>
>> >> >>    21248:       a0 10 3f ff     mov  -1, %l0
>> >> >>    2124c:       a0 38 00 10     xnor  %g0, %l0, %l0
>> >> >>    21250:       96 a0 00 0b     subcc  %g0, %o3, %o3
>> >> >>    21254:       10 6f ff ee     b  %xcc, 2120c <__divdi3+0x2c>
>> >> >>    21258:       94 60 00 0a     subc  %g0, %o2, %o2
>> >> >>    2125c:       01 00 00 00     nop
>> >> >>
>> >> >> This is libc from OpenBSD/Sparc64 4.9:
>> >> >> $ objdump -d /usr/lib/libc.so.58.0 |grep %g1|wc -l
>> >> >>    40562
>> >> >> $ objdump -d /usr/lib/libc.so.58.0 |grep %g2|wc -l
>> >> >>    20384
>> >> >> $ objdump -d /usr/lib/libc.so.58.0 |grep %g3|wc -l
>> >> >>    10240
>> >> >> $ objdump -d /usr/lib/libc.so.58.0 |grep %g4|wc -l
>> >> >>     6606
>> >> >> $ objdump -d /usr/lib/libc.so.58.0 |grep %g5|wc -l
>> >> >>     3811
>> >> >> $ objdump -d /usr/lib/libc.so.58.0 |grep %g6|wc -l
>> >> >>        4
>> >> >> $ objdump -d /usr/lib/libc.so.58.0 |grep %g7|wc -l
>> >> >>       20
>> >> >>
>> >> >> Not so great there either.
>> >> >>
>> >> >> > Anyway, I don't see why keeping TCG_AREG0 inside the TCG generated 
>> >> >> > code
>> >> >> > would prevent you to use a register from the %i set for it.
>> >> >>
>> >> >> The helpers currently use global env register, but %i registers can't
>> >> >> be accessed from the next level of function call nesting hierarchy so
>> >> >> they can't be used for global env.
>> >> >>
>> >> >
>> >> > That's the current situation yes. Using %i registers for TCG_AREG0 does
>> >> > mean you can't use a global env register in the helpers, but it doesn't
>> >> > mean that internal TCG code can't use them for TCG_AREG0.
>> >>
>> >> Exactly.
>> >>
>> >> > What I am telling you since the beginning is that:
>> >> > - I have no objection that we stop using a fixed register in GCC
>> >> >  generated code (that is completely removing HELPER_CFLAGS). However I
>> >> >  don't really see the point of doing that, though the Sparc issue might
>> >> >  be an argument.
>> >> > - I do have objection to remove TCG_AREG0 from inside the TCG generated
>> >> >  code, this register is used for almost every TCG op, and I don't see
>> >> >  any real argument for not keeping it.
>> >>
>> >> I'm pretty much open at this point for all alternatives.
>> >>
>> >
>> > So what about getting rid of TCG_AREG0 for GCC generated code only, at
>> > least as a first step?
>> >
>> > So what about the following changes:
>> > - Change TCG_AREG0 of all targets to a callee saved register (if
>> >  possible, e.g. sparc)
>> > - Change the prologue of all TCG targets to take env as an argument,
>> >  and save it into TCG_AREG0.
>>
>> This can be the first step.
>>
>> > - Change all helpers to explicitly take an env pointer instead of using
>> >  the fixed register. Note that it also includes softmmu helpers, but
>> >  the TCG load/store instructions should be kept unchanged.
>>
>> I think this step will lose performance slightly if TCG is not changed.
>
> Agreed. What do you have in mind by "if TCG is not changed"?

If the register is still fixed in generated code, it would be easily
available for the helpers as well in for of global env. Not taking
advantage of this should be a minor loss, which is offset with one
register more for GCC at the next step.

>> > - Remove HELPER_CFLAGS from makefiles when all helpers have been
>> >  changed.
>>
>> This should restore most of the performance loss from previous step,
>> maybe even improve.
>>
>> > - TCG_AREG0 can then be changed to another register if needed.
>>
>> I'd combine this with callee saved register change. Anyway, at this
>> point there should be a lot of flexibility with the register choice.
>>
>> > And later we can do more steps to get a complete removal of TCG_AREG0,
>> > including in TCG code, though I still think it is a really bad idea.
>>
>> Maybe. At this point most of the hard work has been done, so it's
>> possible to make experiments.
>
> That's exactly my point. We more or less agree on previous step, not on
> this one, so we will need to experiment for that.

OK. Also, most of the cleanup benefits will have been achieved by this point.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]