qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC/PATCH] i386: Atomically update PTEs with mttcg


From: Richard Henderson
Subject: Re: [Qemu-devel] [RFC/PATCH] i386: Atomically update PTEs with mttcg
Date: Thu, 29 Nov 2018 11:04:38 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0

On 11/28/18 3:15 PM, Benjamin Herrenschmidt wrote:
> +static inline uint64_t update_entry(CPUState *cs, target_ulong addr,
> +                                    uint64_t orig_entry, uint32_t bits)
> +{
> +    uint64_t new_entry = orig_entry | bits;
> +
> +    /* Write the updated bottom 32-bits */
> +    if (qemu_tcg_mttcg_enabled()) {
> +        uint32_t old_le = cpu_to_le32(orig_entry);
> +        uint32_t new_le = cpu_to_le32(new_entry);
> +        MemTxResult result;
> +        uint32_t old_ret;
> +
> +        old_ret = address_space_cmpxchgl_notdirty(cs->as, addr,
> +                                                  old_le, new_le,
> +                                                  MEMTXATTRS_UNSPECIFIED,
> +                                                  &result);
> +        if (result == MEMTX_OK) {
> +            if (old_ret != old_le) {
> +                new_entry = 0;
> +            }
> +            return new_entry;
> +        }
> +
> +        /* Do we need to support this case where PTEs aren't in RAM ?
> +         *
> +         * For now fallback to non-atomic case
> +         */
> +    }
> +
> +    x86_stl_phys_notdirty(cs, addr, new_entry);
> +
> +    return new_entry;
> +}

While I know the existing code just updates the low 32-bits, I'd rather update
the whole entry, and make use of the old value returned from the cmpxchg.

E.g.

static bool update_entry(CPUState *cs, target_ulong addr,
                         uint64_t *entry, uint32_t bits)
{
    uint64_t old_entry = *entry;
    uint64_t new_entry = old_entry | bits;

    if (qemu_tcg_mttcg_enabled()) {
        uint64_t cmp_le = cpu_to_le64(old_entry);
        uint64_t new_le = cpu_to_le64(new_entry);
        uint64_t old_le;
        MemTxResult result;

        old_le = address_space_cmpxchgl_notdirty(cs->as, addr,
                                                 cmp_le, new_le,
                                                 MEMTXATTRS_UNSPECIFIED,
                                                 &result);
        if (result == MEMTX_OK) {
            if (old_le == cmp_le) {
                return true;
            } else {
                *entry = le_to_cpu64(old_le);
                return false;
            }
        }
    }
    x86_stq_phys_notdirty(cs, addr, new_entry);
    return true;
}

....

    pdpe_addr = (pml4e & PG_ADDRESS_MASK) +
                (((gphys >> 30) & 0x1ff) << 3);
    pdpe = x86_ldq_phys(cs, pdpe_addr);
    do {
        if (!(pdpe & PG_PRESENT_MASK)) {
            goto do_fault;
        }
        if (pdpe & rsvd_mask) {
            goto do_fault_rsvd;
        }
        if (pdpe & PG_ACCESSED_MASK) {
            break;
        }
    } while (!update_entry(cs, pdpe_addr, &pdpe, PG_ACCESSED_MASK));
    ptep &= pdpe ^ PG_NX_MASK;

....

Although I think it would be really great if we could figure out something that
allows us to promote this whole load/cmpxchg loop into a primitive that avoids
multiple translations of the address.

No, I don't know what that primitive would look like.  :-)


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]