[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access()
From: |
Richard Henderson |
Subject: |
Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access() |
Date: |
Wed, 21 Aug 2019 12:19:47 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 |
On 8/21/19 10:37 AM, David Hildenbrand wrote:
> Hah, guess what, I implemented a similar variant of "fetch all
> of the host addresses" *but* it is not that easy as you might
> think (sorry for the bad news).
I think it is, because I didn't think it *that* easy. :-)
> There are certain cases where we can't get access to the raw host
> page. Namely, cpu watchpoints, LAP, NODIRTY. In summary: this won't
> as you describe. (my first approach did exactly this)
NODIRTY and LAP are automatically handled via probe_write
faulting instead of returning the address. I think there
may be a bug in probe_write at present not checking
Watchpoints could be handled the same way, if we were to
export check_watchpoint from exec.c. Indeed, I see no way
to handle watchpoints correctly if we don't. I think that's
an outstanding bug with probe_write.
Any other objections? I certainly think that restricting the
size of such operations to one page is a large simplification
over the S390Access array thing that you create in this patch.
r~
>
> The following patch requires another re-factoring
> (tcg_s390_cpu_mmu_translate), but you should get the idea.
>
>
>
> From 0cacd2aea3dbc25e93492cca04f6c866b86d7f8a Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <address@hidden>
> Date: Tue, 20 Aug 2019 09:37:11 +0200
> Subject: [PATCH v1] s390x/tcg: Fault-safe MVC (MOVE) implementation
>
> MVC can cross page boundaries. In case we fault on the second page, we
> already partially copied data. If we have overlaps, we would
> trigger a fault after having partially moved data, eventually having
> our original data already overwritten. When continuing after the fault,
> we would try to move already modified data, not the original data -
> very bad.
>
> glibc started to use MVC for forward memmove() and is able to trigger
> exectly this corruption (via rpmbuild and rpm). Fedora 31 (rawhide)
> currently fails to install as we trigger rpm database corruptions due to
> this bug.
>
> We need a way to translate a virtual address range to individual pages that
> we can access later on without triggering faults. Probing all virtual
> addresses once before the read/write is not sufficient - the guest could
> have modified the page tables (e.g., write-protect, map out) in between,
> so on we could fault on any new tlb_fill() - we have to skip any new MMU
> translations.
>
> Unfortunately, there are TLB entries for which cannot get a host address
> for (esp., watchpoints, LAP, NOTDIRTY) - in these cases we cannot avoid
> a new MMU translation using the ordinary ld/st helpers. Let's fallback
> to guest physical addresses in these cases, that we access via
> cpu_physical_memory_(read|write),
>
> This change reduced the boottime for s390x guests (to prompt) from ~1:29
> min to ~1:19 min in my tests. For example, LAP protected pages are now only
> translated once when writing to them using MVC and we don't always fallback
> to byte-based copies.
>
> We will want to use the same mechanism for other accesses as well (e.g.,
> mvcl), prepare for that right away.
>
> Signed-off-by: David Hildenbrand <address@hidden>
> ---
> target/s390x/mem_helper.c | 213 +++++++++++++++++++++++++++++++++++---
> 1 file changed, 200 insertions(+), 13 deletions(-)
>
> diff --git a/target/s390x/mem_helper.c b/target/s390x/mem_helper.c
> index 91ba2e03d9..1ca293e00d 100644
> --- a/target/s390x/mem_helper.c
> +++ b/target/s390x/mem_helper.c
> @@ -24,8 +24,10 @@
> #include "exec/helper-proto.h"
> #include "exec/exec-all.h"
> #include "exec/cpu_ldst.h"
> +#include "exec/cpu-common.h"
> #include "qemu/int128.h"
> #include "qemu/atomic128.h"
> +#include "tcg_s390x.h"
>
> #if !defined(CONFIG_USER_ONLY)
> #include "hw/s390x/storage-keys.h"
> @@ -104,6 +106,181 @@ static inline void cpu_stsize_data_ra(CPUS390XState
> *env, uint64_t addr,
> }
> }
>
> +/*
> + * An access covers one page, except for the start/end of the translated
> + * virtual address range.
> + */
> +typedef struct S390Access {
> + union {
> + char *haddr;
> + hwaddr paddr;
> + };
> + uint16_t size;
> + bool isHaddr;
> +} S390Access;
> +
> +/*
> + * Prepare access to a virtual address range, guaranteeing we won't trigger
> + * faults during the actual access. Sometimes we can't get access to the
> + * host address (e.g., LAP, cpu watchpoints/PER, clean pages, ...). Then, we
> + * translate to guest physical addresses instead. We'll have to perform
> + * slower, indirect, access to these physical addresses then.
> + */
> +static void access_prepare_idx(CPUS390XState *env, S390Access access[],
> + int nb_access, vaddr vaddr, target_ulong size,
> + MMUAccessType access_type, int mmu_idx,
> + uintptr_t ra)
> +{
> + int i = 0;
> + int cur_size;
> +
> + /*
> + * After we obtained the host address of a TLB entry that entry might
> + * be invalidated again - e.g., via tlb_set_dirty(), via another
> + * tlb_fill(). We assume here that it is fine to temporarily store the
> + * host address to access it later - we didn't agree to any tlb flush and
> + * there seems to be no mechanism protecting the return value of
> + * tlb_vaddr_to_host().
> + */
> + while (size) {
> + g_assert(i < nb_access);
> + cur_size = adj_len_to_page(size, vaddr);
> +
> + access[i].isHaddr = true;
> + access[i].haddr = tlb_vaddr_to_host(env, vaddr, access_type,
> mmu_idx);
> + if (!access[i].haddr) {
> + access[i].isHaddr = false;
> + tcg_s390_cpu_mmu_translate(env, vaddr, cur_size, access_type,
> + mmu_idx, false, &access[i].paddr,
> + NULL, ra);
> + }
> + access[i].size = cur_size;
> +
> + vaddr += cur_size;
> + size -= cur_size;
> + i++;
> + }
> +
> + /* let's zero-out the remaining entries, so we have a size of 0 */
> + if (i < nb_access) {
> + memset(&access[i], 0 , sizeof(S390Access) * (nb_access - i));
> + }
> +}
> +
> +static void access_prepare(CPUS390XState *env, S390Access access[],
> + int nb_access, target_ulong vaddr, target_ulong
> size,
> + MMUAccessType access_type, uintptr_t ra)
> +{
> + int mmu_idx = cpu_mmu_index(env, false);
> +
> + access_prepare_idx(env, access, nb_access, vaddr, size, access_type,
> + mmu_idx, ra);
> +}
> +
> +static void access_set(CPUS390XState *env, S390Access write[], int nb_write,
> + uint8_t byte, target_ulong size)
> +{
> + target_ulong cur_size;
> + void *buf = NULL;
> + int w = 0;
> +
> + while (size) {
> + g_assert(w < nb_write);
> + if (!write[w].size) {
> + w++;
> + continue;
> + }
> +
> + cur_size = MIN(size, write[w].size);
> + if (write[w].isHaddr) {
> + memset(write[w].haddr, byte, cur_size);
> + write[w].haddr += cur_size;
> + } else {
> +#ifndef CONFIG_USER_ONLY
> + if (!buf) {
> + buf = g_malloc(TARGET_PAGE_SIZE);
> + memset(buf, byte, cur_size);
> + }
> + cpu_physical_memory_write(write[w].paddr, buf, cur_size);
> + write[w].paddr += cur_size;
> +#else
> + g_assert_not_reached();
> +#endif
> + }
> + write[w].size -= cur_size;
> + size -= cur_size;
> + }
> + g_free(buf);
> +}
> +
> +/*
> + * Copy memory in chunks up to chunk_size. If the ranges don't overlap or
> + * if it's a forward move, this function behaves like memmove().
> + *
> + * To achieve a backwards byte-by-byte copy (e.g., MVC), the chunk_size
> + * must not be bigger than the address difference (in the worst case, 1
> byte).
> + */
> +static void access_copy(CPUS390XState *env, S390Access write[], int nb_write,
> + S390Access read[], int nb_read, target_ulong size,
> + target_ulong chunk_size)
> +{
> + target_ulong cur_size;
> + void *buf = NULL;
> + int r = 0;
> + int w = 0;
> +
> + g_assert(chunk_size > 0);
> + chunk_size = MIN(chunk_size, TARGET_PAGE_SIZE);
> +
> + while (size) {
> + g_assert(w < nb_write);
> + g_assert(r < nb_read);
> + if (!write[w].size) {
> + w++;
> + continue;
> + }
> + if (!read[r].size) {
> + r++;
> + continue;
> + }
> + cur_size = MIN(MIN(MIN(size, write[w].size), read[r].size),
> chunk_size);
> +
> + if (write[w].isHaddr && read[r].isHaddr) {
> + memmove(write[w].haddr, read[r].haddr,
> + cur_size);
> + write[w].haddr += cur_size;
> + read[r].haddr += cur_size;
> +#ifndef CONFIG_USER_ONLY
> + } else if (!write[w].isHaddr && read[r].isHaddr) {
> + cpu_physical_memory_write(write[w].paddr,
> + read[r].haddr, cur_size);
> + write[w].paddr += cur_size;
> + read[r].haddr += cur_size;
> + } else if (write[w].isHaddr && !read[r].isHaddr) {
> + cpu_physical_memory_read(read[r].paddr,
> + write[w].haddr, cur_size);
> + write[w].haddr += cur_size;
> + read[r].paddr += cur_size;
> + } else {
> + if (!buf) {
> + buf = g_malloc(chunk_size);
> + }
> + cpu_physical_memory_read(read[r].paddr, buf, cur_size);
> + cpu_physical_memory_write(write[w].paddr, buf, cur_size);
> + write[w].paddr += cur_size;
> + read[r].paddr += cur_size;
> +#else
> + } else {
> + g_assert_not_reached();
> +#endif
> + }
> + write[w].size -= cur_size;
> + read[r].size -= cur_size;
> + size -= cur_size;
> + }
> + g_free(buf);
> +}
> +
> static void fast_memset(CPUS390XState *env, uint64_t dest, uint8_t byte,
> uint32_t l, uintptr_t ra)
> {
> @@ -302,24 +479,34 @@ uint32_t HELPER(oc)(CPUS390XState *env, uint32_t l,
> uint64_t dest,
> static uint32_t do_helper_mvc(CPUS390XState *env, uint32_t l, uint64_t dest,
> uint64_t src, uintptr_t ra)
> {
> - uint32_t i;
> + /* 256 bytes cannot cross more than two pages */
> + S390Access read[2];
> + S390Access write[2];
>
> HELPER_LOG("%s l %d dest %" PRIx64 " src %" PRIx64 "\n",
> __func__, l, dest, src);
> + l++;
>
> - /* mvc and memmove do not behave the same when areas overlap! */
> - /* mvc with source pointing to the byte after the destination is the
> - same as memset with the first source byte */
> + g_assert(l <= 256);
> + access_prepare(env, write, ARRAY_SIZE(write), dest, l, MMU_DATA_STORE,
> ra);
> +
> + /*
> + * The result of MVC is as if moving single bytes from left to right
> + * (in contrast to memmove()). It can be used like memset().
> + */
> if (dest == src + 1) {
> - fast_memset(env, dest, cpu_ldub_data_ra(env, src, ra), l + 1, ra);
> - } else if (dest < src || src + l < dest) {
> - fast_memmove(env, dest, src, l + 1, ra);
> - } else {
> - /* slow version with byte accesses which always work */
> - for (i = 0; i <= l; i++) {
> - uint8_t x = cpu_ldub_data_ra(env, src + i, ra);
> - cpu_stb_data_ra(env, dest + i, x, ra);
> - }
> + access_set(env, write, ARRAY_SIZE(write),
> + cpu_ldub_data_ra(env, src, ra), l);
> + return env->cc_op;
> + }
> +
> + access_prepare(env, read, ARRAY_SIZE(read), src, l, MMU_DATA_LOAD, ra);
> + if (dest < src || src + l <= dest) {
> + access_copy(env, write, ARRAY_SIZE(write), read, ARRAY_SIZE(read), l,
> + TARGET_PAGE_SIZE);
> + } else if (src < dest) {
> + access_copy(env, write, ARRAY_SIZE(write), read, ARRAY_SIZE(read), l,
> + dest - src);
> }
>
> return env->cc_op;
>
- [Qemu-devel] [PATCH v1 0/4] s390x/tcg: MOVE (MVC): Fault-safe handling, David Hildenbrand, 2019/08/21
- [Qemu-devel] [PATCH v1 1/4] s390x/tcg: Use guest_addr_valid() instead of h2g_valid() in probe_write_access(), David Hildenbrand, 2019/08/21
- [Qemu-devel] [PATCH v1 3/4] s390x/tcg: MOVE (MVC): Increment the length once, David Hildenbrand, 2019/08/21
- [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), Richard Henderson, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(),
Richard Henderson <=
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), Richard Henderson, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), Richard Henderson, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), Richard Henderson, 2019/08/21
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/22
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/22
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/26
- Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), David Hildenbrand, 2019/08/21
[Qemu-devel] [PATCH v1 4/4] s390x/tcg: MOVE (MVC): Fault-safe handling, David Hildenbrand, 2019/08/21