On 11/4/24 9:48 AM, Richard Henderson wrote:
On 10/30/24 15:25, Paolo Savini wrote:
On 10/30/24 11:40, Richard Henderson wrote:
__builtin_memcpy DOES NOT equal VMOVDQA
I am aware of this. I took __builtin_memcpy as a generic enough way to emulate loads
and stores that should allow several hosts to generate the widest load/store
instructions they can and on x86 I see this generates instructions vmovdpu/movdqu that
are not always guaranteed to be atomic. x86 though guarantees them to be atomic if the
memory address is aligned to 16 bytes.
No, AMD guarantees MOVDQU is atomic if aligned, Intel does not.
See the comment in util/cpuinfo-i386.c, and the two CPUINFO_ATOMIC_VMOVDQ[AU]
bits.
See also host/include/*/host/atomic128-ldst.h, HAVE_ATOMIC128_RO, and
atomic16_read_ro.
Not that I think you should use that here; it's complicated, and I think you're better
off relying on the code in accel/tcg/ when more than byte atomicity is required.
Not sure if that's what you meant but I didn't find any clear example of
multi-byte atomicity using qatomic_read() and friends that would be closer
to what memcpy() is doing here. I found one example in bdrv_graph_co_rdlock()
that seems to use a mem barrier via smp_mb() and qatomic_read() inside a
loop, but I don't understand that code enough to say.