[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC/PATCH] i386: Atomically update PTEs with mttcg
From: |
Richard Henderson |
Subject: |
Re: [Qemu-devel] [RFC/PATCH] i386: Atomically update PTEs with mttcg |
Date: |
Thu, 29 Nov 2018 11:04:38 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0 |
On 11/28/18 3:15 PM, Benjamin Herrenschmidt wrote:
> +static inline uint64_t update_entry(CPUState *cs, target_ulong addr,
> + uint64_t orig_entry, uint32_t bits)
> +{
> + uint64_t new_entry = orig_entry | bits;
> +
> + /* Write the updated bottom 32-bits */
> + if (qemu_tcg_mttcg_enabled()) {
> + uint32_t old_le = cpu_to_le32(orig_entry);
> + uint32_t new_le = cpu_to_le32(new_entry);
> + MemTxResult result;
> + uint32_t old_ret;
> +
> + old_ret = address_space_cmpxchgl_notdirty(cs->as, addr,
> + old_le, new_le,
> + MEMTXATTRS_UNSPECIFIED,
> + &result);
> + if (result == MEMTX_OK) {
> + if (old_ret != old_le) {
> + new_entry = 0;
> + }
> + return new_entry;
> + }
> +
> + /* Do we need to support this case where PTEs aren't in RAM ?
> + *
> + * For now fallback to non-atomic case
> + */
> + }
> +
> + x86_stl_phys_notdirty(cs, addr, new_entry);
> +
> + return new_entry;
> +}
While I know the existing code just updates the low 32-bits, I'd rather update
the whole entry, and make use of the old value returned from the cmpxchg.
E.g.
static bool update_entry(CPUState *cs, target_ulong addr,
uint64_t *entry, uint32_t bits)
{
uint64_t old_entry = *entry;
uint64_t new_entry = old_entry | bits;
if (qemu_tcg_mttcg_enabled()) {
uint64_t cmp_le = cpu_to_le64(old_entry);
uint64_t new_le = cpu_to_le64(new_entry);
uint64_t old_le;
MemTxResult result;
old_le = address_space_cmpxchgl_notdirty(cs->as, addr,
cmp_le, new_le,
MEMTXATTRS_UNSPECIFIED,
&result);
if (result == MEMTX_OK) {
if (old_le == cmp_le) {
return true;
} else {
*entry = le_to_cpu64(old_le);
return false;
}
}
}
x86_stq_phys_notdirty(cs, addr, new_entry);
return true;
}
....
pdpe_addr = (pml4e & PG_ADDRESS_MASK) +
(((gphys >> 30) & 0x1ff) << 3);
pdpe = x86_ldq_phys(cs, pdpe_addr);
do {
if (!(pdpe & PG_PRESENT_MASK)) {
goto do_fault;
}
if (pdpe & rsvd_mask) {
goto do_fault_rsvd;
}
if (pdpe & PG_ACCESSED_MASK) {
break;
}
} while (!update_entry(cs, pdpe_addr, &pdpe, PG_ACCESSED_MASK));
ptep &= pdpe ^ PG_NX_MASK;
....
Although I think it would be really great if we could figure out something that
allows us to promote this whole load/cmpxchg loop into a primitive that avoids
multiple translations of the address.
No, I don't know what that primitive would look like. :-)
r~