[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] [PATCH 0/4] stdatomic: code generators

From: Michael Matz
Subject: Re: [Tinycc-devel] [PATCH 0/4] stdatomic: code generators
Date: Tue, 16 Feb 2021 01:01:04 +0100 (CET)
User-agent: Alpine 2.21 (LSU 202 2017-01-01)

Hello again,

On Tue, 16 Feb 2021, Michael Matz wrote:

On Sun, 14 Feb 2021, Dmitry Selyutin wrote:

 The first patch introduces a set of routines which any platform which
 wants to support atomics must implement. I don't quite like that
 there's a lot of code duplication, but I haven't come up with a good
 idea on how to avoid it (I've been thinking of some trick with weak
 functions, though). I'm also not sure of ST_STATIC specifier; any tips
 regarding its usage are highly appreciated. I added it as I saw it's
 used in the code around; perhaps this is not required, so I can make
 the routines weak by default?

 The second patch adjusts tokenizer and generator appropriately, and
 also fixes some minor issues. From now on, the count of tokens matches
 count of atomic routines, and calls platform-specific code instead of
 calling usual functions. I'd like to keep this approach in order to
 make the code a bit more flexible. This is not for speed but, rather,
 for being able to tune per-platform code in the future. I'm totally
 open for the discussion.

 The third patch extends x86_64 code generator to generate code from
 the binary buffers, not byte-by-byte, as with g() routine. This
 functionality will be used in the ultimate patch, if it gets accepted.

 The last patch is the implementation for x86_64. This patch is likely
 a controversial one. I tried to make the code somewhat generic to
 different argument sizes, at the same time making it look like a
 function call. It's also caused by the fact that I checked the code
 generated by gcc for cases when usual stdatomic routines are wrapped
 into simple routines. I'm pretty sure a lot there can be improved;
 perhaps many of you will find the approach to be unorthodox to some
 degree. This is just the idea; I'm totally open for discussion.

So, I think you want to iterate a bit on this to find some tiny ways :)

Some ideas:

* For the unimplemented targets: e.g. introduce a define that a target
  sets, define erroring fallbacks (or empty macros or suchlike) if the
  macro isn't set (see e.g. CONFIG_TCC_ASM in tccgen.c).
* commonize the routines: there is no reason why you need four routines
  for four basic arithmetic operations, if gen_op() supports all
  arithmetic operations.  I.e. make it an argument to a single routine.
* The atomic routines itself: like others I suggest doing normal calls to
 library routines.  TCC is _not_ about fastest code.
* For the routines you do want to inline, the use of opcode bytes: nah,
  that can be done in a nicer way.  Think about the very core that you
  need: you will find it's the locked cmpxchg loop and the xchg insn
  itself.  Both have the property that they are very similar to stores
  (including the fact that they have a size),
  they just happen to leave something interesting in the register operand.
  I.e. it would be natural to just extend the 'store' routine to be able
  to emit (lock cmp)xchg instead of the store opcode.

  That will give you the possiblity to accept arbitrary registers instead
  of having to hard-code ax/si/di, obviating the need for the
  prologue/epilogue routines.

  For that you also probably want to use the existing helpers orex and
  gen_modrm(64) from x86_64-gen.c .  After some fiddling you will probably
  find that _not_ hardcoding specific registers is actually going to be

* In a similar vain: your atomic load/store routines: these are simply
  load/store themself again.

Drats, I forgot the last remark I wanted to make: don't forget that TCC supports inline asm. So another alternative would be to define all the routines/builtins you want as macros expanding to appropriate inline asm. That would enable not hardcoding registers, _and_ would be maintainable instead of raw opcode bytes.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]