[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] [PATCH 0/4] stdatomic: code generators

From: Michael Matz
Subject: Re: [Tinycc-devel] [PATCH 0/4] stdatomic: code generators
Date: Tue, 16 Feb 2021 00:57:23 +0100 (CET)
User-agent: Alpine 2.21 (LSU 202 2017-01-01)


On Sun, 14 Feb 2021, Dmitry Selyutin wrote:

The first patch introduces a set of routines which any platform which
wants to support atomics must implement. I don't quite like that
there's a lot of code duplication, but I haven't come up with a good
idea on how to avoid it (I've been thinking of some trick with weak
functions, though). I'm also not sure of ST_STATIC specifier; any tips
regarding its usage are highly appreciated. I added it as I saw it's
used in the code around; perhaps this is not required, so I can make
the routines weak by default?

The second patch adjusts tokenizer and generator appropriately, and
also fixes some minor issues. From now on, the count of tokens matches
count of atomic routines, and calls platform-specific code instead of
calling usual functions. I'd like to keep this approach in order to
make the code a bit more flexible. This is not for speed but, rather,
for being able to tune per-platform code in the future. I'm totally
open for the discussion.

The third patch extends x86_64 code generator to generate code from
the binary buffers, not byte-by-byte, as with g() routine. This
functionality will be used in the ultimate patch, if it gets accepted.

The last patch is the implementation for x86_64. This patch is likely
a controversial one. I tried to make the code somewhat generic to
different argument sizes, at the same time making it look like a
function call. It's also caused by the fact that I checked the code
generated by gcc for cases when usual stdatomic routines are wrapped
into simple routines. I'm pretty sure a lot there can be improved;
perhaps many of you will find the approach to be unorthodox to some
degree. This is just the idea; I'm totally open for discussion.

So, I think you want to iterate a bit on this to find some tiny ways :)

Some ideas:

* For the unimplemented targets: e.g. introduce a define that a target
  sets, define erroring fallbacks (or empty macros or suchlike) if the
  macro isn't set (see e.g. CONFIG_TCC_ASM in tccgen.c).
* commonize the routines: there is no reason why you need four routines
  for four basic arithmetic operations, if gen_op() supports all
  arithmetic operations.  I.e. make it an argument to a single routine.
* The atomic routines itself: like others I suggest doing normal calls to
  library routines.  TCC is _not_ about fastest code.
* For the routines you do want to inline, the use of opcode bytes: nah,
  that can be done in a nicer way.  Think about the very core that you
  need: you will find it's the locked cmpxchg loop and the xchg insn
  itself.  Both have the property that they are very similar to stores
  (including the fact that they have a size),
  they just happen to leave something interesting in the register operand.
  I.e. it would be natural to just extend the 'store' routine to be able
  to emit (lock cmp)xchg instead of the store opcode.

  That will give you the possiblity to accept arbitrary registers instead
  of having to hard-code ax/si/di, obviating the need for the
  prologue/epilogue routines.

  For that you also probably want to use the existing helpers orex and
  gen_modrm(64) from x86_64-gen.c .  After some fiddling you will probably
  find that _not_ hardcoding specific registers is actually going to be

* In a similar vain: your atomic load/store routines: these are simply
  load/store themself again.

I think it would be good to have a testcase showing the atomic support in the works as well.

I hope that wasn't too severe :-)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]