[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] [PATCH 0/4] stdatomic: code generators

From: Dmitry Selyutin
Subject: Re: [Tinycc-devel] [PATCH 0/4] stdatomic: code generators
Date: Thu, 11 Mar 2021 23:08:07 +0300


sorry for a delayed reply. Thank you for your comments and ideas! I'll
publish a new version which takes the approach I initially used and
you all find to be a better one: resorting to function calls.

On Tue, Feb 16, 2021 at 3:01 AM Michael Matz <matz.tcc@frakked.de> wrote:
> Hello again,
> On Tue, 16 Feb 2021, Michael Matz wrote:
> > On Sun, 14 Feb 2021, Dmitry Selyutin wrote:
> >
> >>  The first patch introduces a set of routines which any platform which
> >>  wants to support atomics must implement. I don't quite like that
> >>  there's a lot of code duplication, but I haven't come up with a good
> >>  idea on how to avoid it (I've been thinking of some trick with weak
> >>  functions, though). I'm also not sure of ST_STATIC specifier; any tips
> >>  regarding its usage are highly appreciated. I added it as I saw it's
> >>  used in the code around; perhaps this is not required, so I can make
> >>  the routines weak by default?
> >>
> >>  The second patch adjusts tokenizer and generator appropriately, and
> >>  also fixes some minor issues. From now on, the count of tokens matches
> >>  count of atomic routines, and calls platform-specific code instead of
> >>  calling usual functions. I'd like to keep this approach in order to
> >>  make the code a bit more flexible. This is not for speed but, rather,
> >>  for being able to tune per-platform code in the future. I'm totally
> >>  open for the discussion.
> >>
> >>  The third patch extends x86_64 code generator to generate code from
> >>  the binary buffers, not byte-by-byte, as with g() routine. This
> >>  functionality will be used in the ultimate patch, if it gets accepted.
> >>
> >>  The last patch is the implementation for x86_64. This patch is likely
> >>  a controversial one. I tried to make the code somewhat generic to
> >>  different argument sizes, at the same time making it look like a
> >>  function call. It's also caused by the fact that I checked the code
> >>  generated by gcc for cases when usual stdatomic routines are wrapped
> >>  into simple routines. I'm pretty sure a lot there can be improved;
> >>  perhaps many of you will find the approach to be unorthodox to some
> >>  degree. This is just the idea; I'm totally open for discussion.
> >
> > So, I think you want to iterate a bit on this to find some tiny ways :)
> >
> > Some ideas:
> >
> > * For the unimplemented targets: e.g. introduce a define that a target
> >   sets, define erroring fallbacks (or empty macros or suchlike) if the
> >   macro isn't set (see e.g. CONFIG_TCC_ASM in tccgen.c).
> > * commonize the routines: there is no reason why you need four routines
> >   for four basic arithmetic operations, if gen_op() supports all
> >   arithmetic operations.  I.e. make it an argument to a single routine.
> > * The atomic routines itself: like others I suggest doing normal calls to
> >  library routines.  TCC is _not_ about fastest code.
> > * For the routines you do want to inline, the use of opcode bytes: nah,
> >   that can be done in a nicer way.  Think about the very core that you
> >   need: you will find it's the locked cmpxchg loop and the xchg insn
> >   itself.  Both have the property that they are very similar to stores
> >   (including the fact that they have a size),
> >   they just happen to leave something interesting in the register operand.
> >   I.e. it would be natural to just extend the 'store' routine to be able
> >   to emit (lock cmp)xchg instead of the store opcode.
> >
> >   That will give you the possiblity to accept arbitrary registers instead
> >   of having to hard-code ax/si/di, obviating the need for the
> >   prologue/epilogue routines.
> >
> >   For that you also probably want to use the existing helpers orex and
> >   gen_modrm(64) from x86_64-gen.c .  After some fiddling you will probably
> >   find that _not_ hardcoding specific registers is actually going to be
> >   easier.
> >
> > * In a similar vain: your atomic load/store routines: these are simply
> >   load/store themself again.
> Drats, I forgot the last remark I wanted to make: don't forget that TCC
> supports inline asm.  So another alternative would be to define all the
> routines/builtins you want as macros expanding to appropriate inline asm.
> That would enable not hardcoding registers, _and_ would be maintainable
> instead of raw opcode bytes.
> Ciao,
> Michael.

Best regards,
Dmitry Selyutin

reply via email to

[Prev in Thread] Current Thread [Next in Thread]