[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Tinycc-devel] inline assembly and optimization passes

From: Sylvain BERTRAND
Subject: [Tinycc-devel] inline assembly and optimization passes
Date: Fri, 20 Sep 2013 03:08:10 +0200
User-agent: Mutt/1.5.21 (2010-09-15)



I wonder if the internals of tinycc can easily supports basic
optimization passes. The idea is not to compete with gcc and its
hundred of passes, but my guess is that very few optimization
passes would be required to give a significant performance boost
to generated code. Just need to select the "right" ones...

As a C writer, I have been thinking about the "variable aliasing"
pass over a compilation unit. Because a lot of code use many C
variables to reference the same actual variable to make code more
readable. Then in my ignorant mind I see a lot of wasted
registers and stack space and a significant performance loss at a
global scale. Wrong?

I thought about a "constant folding" pass but I'm not convince
this pass would give a significant boost over my perception of
general C code.


Another thing is about inline assembly. Indeed, I happened to
write some code without a libc for x86-64, namely directly wired
to linux syscalls. Quickly the need of assembly caught up. I
could use an assembler (gas, yasm...) but then I would pay the
price of the C calling convention (prolog/epilog/parameters
passing) and the call/ret. The whole point of inline assembly is
to avoid to "dance" too much with the registers/stack because of
the function call. On modern architectures, is the performance
loss of such a "dance" worth the complexity of inline

I know it would mean, at least for x86-64, to write an
assembler/use an external assembler in/for the architecture

On that subject, to work around the complexity of an inline
assembly infrastructure, I notice that it seems inline assembly
is interesting in not that many cases. We could imagine to extend
per architecture the C language with builtin functions/new
operators/keywords. For instance for atomic operations, syscalls...
Am I missing a case we would really want which makes this work
around really and obviously impractical?


Of course, all that is from my perspective, meaning with
0-experience on compiler programming. The idea is to achieve
a reasonable global performance level compared to gcc on
"classic" system loads. To lower the distance, code "hotspots"
would be written in assembly (cf ffmpeg).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]