lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Atomic operations


From: Paulo César Pereira de Andrade
Subject: Re: Atomic operations
Date: Tue, 9 Aug 2022 07:40:01 -0300

Em seg., 8 de ago. de 2022 às 16:16, Marc Nieper-Wißkirchen
<marc.nieper+gnu@gmail.com> escreveu:
>
> Hi Paulo,
>
> thanks for your detailed reply.
>
> Am Mo., 8. Aug. 2022 um 19:43 Uhr schrieb Paulo César Pereira de Andrade 
> <paulo.cesar.pereira.de.andrade@gmail.com>:
>>
>> Em sáb., 6 de ago. de 2022 às 12:46, Marc Nieper-Wißkirchen
>> <marc.nieper+gnu@gmail.com> escreveu:
>> >
>> > Ping.
>>
>>   Hi Marc,
>>
>> > Paulo, I am not sure whether you have already answered this question.
>>
>>   I believe if I replied I did miss the CC to the mailing list. But I 
>> remember
>> the original email.
>>
>> > It would also be interesting to know whether the load and store operations 
>> > are always atomic (on all supported architectures).
>>
>>   As long as using a ldr_* or str_*, ldi_* and sti_* most times need
>> a temporary for the pointer, unless a few special cases where the
>> pointer can be encoded in a single instruction.
>
>
> What I mean by "atomic" here is that when two threads access the memory 
> loaded from/stored to never observe a "half-written" value.

  I understand, and besides being a single instruction, would not serve
much with multiple cpus, as there would not be any ordering guarantees.

> So it would be okay if two instructions are generated - one for loading a 
> pointer and one for actually loading/storing at the pointer as long as both 
> are individually atomic.

  Maybe there is some use for ldr+str, in two instructions, but it would be
full or races.

>>   I believe only float/double load stores are not atomic in 
>> lib/jit_arm-swf.c,
>> where it implements fake registers in the stack.
>
>
> Are you talking about functions like _swf_ldr_d and _swf_ldi_i in 
> lib/jit_arm-swf.c?  It seems that double loads (or stores) are not atomic on 
> some CPUs because two separate 32-bit load/store operations are emitted.

  You are right. Either way, we should only implement wordsize operations.

>>
>> > -- Marc
>> >
>> > Am Mo., 30. Aug. 2021 um 12:45 Uhr schrieb Marc Nieper-Wißkirchen 
>> > <marc.nieper+gnu@gmail.com>:
>> >>
>> >> Could we get instructions for atomic operations into GNU lightning?  At 
>> >> the moment, the only possibility to synchronize GNU lightning code in 
>> >> multi-threaded environments is to call external C code (which can be 
>> >> slow).
>>
>>   I believe this could be done by only supporting gcc as the compiler,
>> and at first, implement slow versions that call the gcc builtins. 
>> Unfortunately
>> it would add a problem in that it would invalidate non callee save gpr 
>> registers
>> for the default implementation.
>>
>> >> I'm thinking of a set of instructions that could be used to implement 
>> >> something in the scope of C's <stdatomic.h>.  In order to implement this, 
>> >> one can take the assembly that GCC generates on the various architectures 
>> >> (see also 
>> >> https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html).
>>
>>   Implementing specialized versions based on generated assembly by
>> gcc should be mostly trivial.
>>
>>   Do you have some proposal or idea of what to implement, and
>> lightning code names?
>>
>>   I believe at first should implement only for wordsize values, and a very
>> small subset, at least enough to implement some kind of "fast" mutex.
>
>
> Here is a minimal API, albeit written for Scheme: 
> https://srfi.schemers.org/srfi-230/srfi-230.html.  What is an atomic (fixnum) 
> box there should be word-sized memory location in GNU lightning.  Atomic 
> pairs (two words) are important for some algorithms.  If they are not easily 
> implementable on a particular architecture, GNU lightning should report this 
> so that the user can call C library routines (from stdatomic) or GCC builtins 
> themselves.
>
> As for GNU lightning instructions, we would probably at least need the 
> following instructions (for word-sized integers):
>
> - loads and stores with relaxed memory order (if I have understood correctly, 
> we can use the usual GNU lightning load/store instructions)
> - loads with acquire memory order
> - stores with release memory order
> - swap (load and store) with relaxed memory order
> - swap (load and store) with acquire-release memory order
> - compare-and-swap with relaxed memory order
> - compare-and-swap with acquire-release memory order

  If lightning were to provide such primites, I believe it should
only "make a contract" of supporting strong compare-and-swap,
not on shared memory (a different process might die with the
lock held), to allow some kind of mutex implementation, what
could be expensive if there are too many waiters spinning.
  Still not trivial to get it on all supported ports, at least with the same
semantics, because if need to implement in an external function call,
it would need to save/restore all JT_R* and JIT_F* registers in the
worst case. Most times could just inline what gcc generates.

> And the same, if supported, for double-word-sized memory operands.
>
> And then the following arithmetic operations (in relaxed and acquire-release 
> semantics):
>
> - fetch-add
> - fetch-sub
> - fetch-or
> - fetch-xor
> - fetch-and
>
> And then an instruction to emit a memory order (release, acquire, 
> acquire-release, sequential consistency) as atomic_thread_fence in 
> stdatomic.h.
> To simplify the interface, it may make sense to offer all operations (but the 
> thread fence instruction) only with relaxed semantics so that the programmer 
> has to emit thread fence instructions explicitly.

  The simplest way to implement it is to have it have some PIC code
implementing it, and use two jit_jmpr to/from the code, but lightning
would still treat the jit_jmpr as function calls, that is, invalidate non callee
save registers.

  As long as using only jmpr, and not modifying registers, should be
enough to call jit_live() once "returning" for any non callee save register
used in the construct, or that must be alive for other use.

Thanks!
Paulo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]