qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to spe


From: Dennis Luehring
Subject: Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
Date: Fri, 21 Aug 2015 08:05:33 +0200
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0

Am 21.08.2015 um 07:49 schrieb Richard Henderson:
On 08/20/2015 09:32 PM, Dennis Luehring wrote:
> gcc prime.c -o prime.out -lm
>
> prime.out runtime
>
> tcg-indirect: ~9.3 sec (best result)
> qemu.org-git: ~11 sec
> without-optimization: ~9.9 sec (worst result)

I presume this is integer prime factoring?


Aurelien Jarno extracted this code from sysbench (just for my qemu sparc64 tests)

#include <math.h>
unsigned long long max_prime = 2000;
void prime_test()
{
  unsigned long long c;
  unsigned long long l,t;
  unsigned long long n=0;
  /* So far we're using very simple test prime number tests in 64bit */
  for(c=3; c < max_prime; c++)
  {
    t = sqrt(c);
    for(l = 2; l <= t; l++)
      if (c % l == 0)
        break;
    if (l > t )
      n++;
  }
}
int main()
{
  int i;
  for (i = 0 ; i < 10000 ; i++)
  {
    prime_test();
  }
  return 0;
}




> g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c -MMD -MP
>
> tcg-indirect: ~2:46.5
> qemu.org-git: ~2:51.2 (worst result)
> without-optimization: ~2:14.1 (best result)

No compiler optimization?  I wouldn't expect there to be much for tcg to
optimize there -- dropping values to memory all the time doesn't leave much.


without-optimization means qemu.org-git release build + undefine USE_TCG_OPTIMIZATIONS in tcg/tcg.c
or what compiler do you mean?



>
> stream results (STREAM version $Revision: 5.10 $)
>
> tcg-indirect: (worst result)
>
> Your clock granularity/precision appears to be 41 microseconds.
> Each test below will take on the order of 632527 microseconds.
>    (= 15427 clock ticks)
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:             320.8     0.511297     0.498785     0.590214
> Scale:            187.0     0.858693     0.855465     0.863527
> Add:              218.2     1.104654     1.099698     1.110341
> Triad:            169.5     1.433273     1.416321     1.502248
>
> qemu.org-git: (best result)
>
> Your clock granularity/precision appears to be 42 microseconds.
> Each test below will take on the order of 330428 microseconds.
>     (= 7867 clock ticks)
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:             771.5     0.214717     0.207377     0.244214
> Scale:            288.1     0.573320     0.555401     0.660161
> Add:              423.5     0.633523     0.566661     1.092067
> Triad:            242.9     1.053032     0.987970     1.499563
>
> without-optimization:
>
> Your clock granularity/precision appears to be 41 microseconds.
> Each test below will take on the order of 745254 microseconds.
>    (= 18176 clock ticks)
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:             316.6     0.524065     0.505313     0.580103
> Scale:            200.5     0.813356     0.798024     0.840986
> Add:              243.9     1.010247     0.984025     1.119149
> Triad:            182.9     1.345601     1.312236     1.427459

These results are weird.  Unoptimized less than half the speed of mainline?
Improving optimization (with no extra work, mind) brings the results back down?


yep they are - it seems that the assumption of the involved developers
where speed can be improved / or slowbess comes from is not correct
how are SPARC64 benchmarks done usually?



r~




reply via email to

[Prev in Thread] Current Thread [Next in Thread]