[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] Huge swings in cache performance

From: Michael B. Smith
Subject: Re: [Tinycc-devel] Huge swings in cache performance
Date: Thu, 5 Jan 2017 23:09:08 +0000

How many times does foo overflow requiring a cache flush?


From: Tinycc-devel [mailto:tinycc-devel-bounces+address@hidden On Behalf Of David Mertens
Sent: Thursday, January 5, 2017 12:59 AM
To: address@hidden
Subject: Re: [Tinycc-devel] Huge swings in cache performance


Update: I *can* get this slowdown with tcc. The main trigger is to have a global variable that gets modified by the function.

This program generates a single function filled with a collection of skipped operations (number of operations is a command-line option) and finished with a modification of a global variable. It compiles the function using tcc, then calls the function a specified number of times (repeat count specified via command-line). It can either generate code in-memory, or it can generate a .so file and load that using dlopen. (If it generates in-memory, it prints the size of the generated code.)

Here are the interesting results on my machine, all for 10,000,000 iterations, using compilation-in-memory:

N   Code Size (Bytes)   Time (s)
0                 128       2.52
1                 144       2.54
2                 176       2.57
3                 208       0.035
4                 224       0.058
5                 256       2.57
6                 272       0.060


Switching over to a shared object file, I get these results (code size is size of the .so file):

N   Code Size (Bytes)   Time (s)
0                2960       0.057
1                2984       0.040
2                3016       0.058
3                3040       0.039
4                3064       0.040
5                3088       0.060
6                3112       0.063


As you can see, the jit-compiled code has odd jumps of 30x speed drops depending on... something. The shared object file, on the other hand, has consistently sound performance.

Two questions:

1) Can anybody reproduce these effects on their Linux machines, especially different architectures? (I can try an ARM tomorrow.)

2) Is there something special about how tcc builds a shared object file that is not happening with the jit-compiled code?




 "Debugging is twice as hard as writing the code in the first place.
  Therefore, if you write the code as cleverly as possible, you are,
  by definition, not smart enough to debug it." -- Brian Kernighan

reply via email to

[Prev in Thread] Current Thread [Next in Thread]