[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-commits] [qemu/qemu] 101924: tcg/i386: Use byte form of xgetbv ins
From: |
GitHub |
Subject: |
[Qemu-commits] [qemu/qemu] 101924: tcg/i386: Use byte form of xgetbv instruction |
Date: |
Fri, 22 Jun 2018 01:57:52 -0700 |
Branch: refs/heads/master
Home: https://github.com/qemu/qemu
Commit: 1019242af11400252f6735ca71a35f81ac23a66d
https://github.com/qemu/qemu/commit/1019242af11400252f6735ca71a35f81ac23a66d
Author: John Arbuckle <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M tcg/i386/tcg-target.inc.c
Log Message:
-----------
tcg/i386: Use byte form of xgetbv instruction
The assembler in most versions of Mac OS X is pretty old and does not
support the xgetbv instruction. To go around this problem, the raw
encoding of the instruction is used instead.
Signed-off-by: John Arbuckle <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: 61b8cef1d42567d3029e0c7180cbd0f16cc4be2d
https://github.com/qemu/qemu/commit/61b8cef1d42567d3029e0c7180cbd0f16cc4be2d
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/cpu-exec.c
M accel/tcg/translate-all.c
M include/qemu/qht.h
M tests/qht-bench.c
M tests/test-qht.c
M util/qht.c
Log Message:
-----------
qht: require a default comparison function
qht_lookup now uses the default cmp function. qht_lookup_custom is defined
to retain the old behaviour, that is a cmp function is explicitly provided.
qht_insert will gain use of the default cmp in the next patch.
Note that we move qht_lookup_custom's @func to be the last argument,
which makes the new qht_lookup as simple as possible.
Instead of this (i.e. keeping @func 2nd):
0000000000010750 <qht_lookup>:
10750: 89 d1 mov %edx,%ecx
10752: 48 89 f2 mov %rsi,%rdx
10755: 48 8b 77 08 mov 0x8(%rdi),%rsi
10759: e9 22 ff ff ff jmpq 10680 <qht_lookup_custom>
1075e: 66 90 xchg %ax,%ax
We get:
0000000000010740 <qht_lookup>:
10740: 48 8b 4f 08 mov 0x8(%rdi),%rcx
10744: e9 37 ff ff ff jmpq 10680 <qht_lookup_custom>
10749: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
Reviewed-by: Richard Henderson <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: 32359d529f30bea8124ed671b2e6a22f22540488
https://github.com/qemu/qemu/commit/32359d529f30bea8124ed671b2e6a22f22540488
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/translate-all.c
M include/qemu/qht.h
M tests/qht-bench.c
M tests/test-qht.c
M util/qht.c
Log Message:
-----------
qht: return existing entry when qht_insert fails
The meaning of "existing" is now changed to "matches in hash and
ht->cmp result". This is saner than just checking the pointer value.
Suggested-by: Richard Henderson <address@hidden>
Reviewed-by: Richard Henderson <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: be2cdc5e352eb28b4ff631f053a261d91e6af78e
https://github.com/qemu/qemu/commit/be2cdc5e352eb28b4ff631f053a261d91e6af78e
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/cpu-exec.c
M accel/tcg/translate-all.c
M include/exec/exec-all.h
M include/exec/tb-context.h
M tcg/tcg.c
M tcg/tcg.h
Log Message:
-----------
tcg: track TBs with per-region BST's
This paves the way for enabling scalable parallel generation of TCG code.
Instead of tracking TBs with a single binary search tree (BST), use a
BST for each TCG region, protecting it with a lock. This is as scalable
as it gets, since each TCG thread operates on a separate region.
The core of this change is the introduction of struct tcg_region_tree,
which contains a pointer to a GTree and an associated lock to serialize
accesses to it. We then allocate an array of tcg_region_tree's, adding
the appropriate padding to avoid false sharing based on
qemu_dcache_linesize.
Given a tc_ptr, we first find the corresponding region_tree. This
is done by special-casing the first and last regions first, since they
might be of size != region.size; otherwise we just divide the offset
by region.stride. I was worried about this division (several dozen
cycles of latency), but profiling shows that this is not a fast path.
Note that region.stride is not required to be a power of two; it
is only required to be a multiple of the host's page size.
Note that with this design we can also provide consistent snapshots
about all region trees at once; for instance, tcg_tb_foreach
acquires/releases all region_tree locks before/after iterating over them.
For this reason we now drop tb_lock in dump_exec_info().
As an alternative I considered implementing a concurrent BST, but this
can be tricky to get right, offers no consistent snapshots of the BST,
and performance and scalability-wise I don't think it could ever beat
having separate GTrees, given that our workload is insert-mostly (all
concurrent BST designs I've seen focus, understandably, on making
lookups fast, which comes at the expense of convoluted, non-wait-free
insertions/removals).
Reviewed-by: Richard Henderson <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: 128ed2278c4e6ad063f101c5dda7999b43f2d8a3
https://github.com/qemu/qemu/commit/128ed2278c4e6ad063f101c5dda7999b43f2d8a3
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/translate-all.c
M include/exec/tb-context.h
M tcg/tcg.c
M tcg/tcg.h
Log Message:
-----------
tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx
Thereby making it per-TCGContext. Once we remove tb_lock, this will
avoid an atomic increment every time a TB is invalidated.
Reviewed-by: Richard Henderson <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: 1e05197f24c49d52f339de9053bb1d17082f1be3
https://github.com/qemu/qemu/commit/1e05197f24c49d52f339de9053bb1d17082f1be3
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/translate-all.c
M include/exec/exec-all.h
Log Message:
-----------
translate-all: iterate over TBs in a page with PAGE_FOR_EACH_TB
This commit does several things, but to avoid churn I merged them all
into the same commit. To wit:
- Use uintptr_t instead of TranslationBlock * for the list of TBs in a page.
Just like we did in (c37e6d7e "tcg: Use uintptr_t type for
jmp_list_{next|first} fields of TB"), the rationale is the same: these
are tagged pointers, not pointers. So use a more appropriate type.
- Only check the least significant bit of the tagged pointers. Masking
with 3/~3 is unnecessary and confusing.
- Introduce the TB_FOR_EACH_TAGGED macro, and use it to define
PAGE_FOR_EACH_TB, which improves readability. Note that
TB_FOR_EACH_TAGGED will gain another user in a subsequent patch.
- Update tb_page_remove to use PAGE_FOR_EACH_TB. In case there
is a bug and we attempt to remove a TB that is not in the list, instead
of segfaulting (since the list is NULL-terminated) we will reach
g_assert_not_reached().
Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: 78722ed0b826644ae240e3c0bbb6bdde02dfe7e1
https://github.com/qemu/qemu/commit/78722ed0b826644ae240e3c0bbb6bdde02dfe7e1
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/translate-all.c
M docs/devel/multi-thread-tcg.txt
Log Message:
-----------
translate-all: make l1_map lockless
Groundwork for supporting parallel TCG generation.
We never remove entries from the radix tree, so we can use cmpxchg
to implement lockless insertions.
Reviewed-by: Richard Henderson <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: 94da9aec2a50f0c82e6c60939275c0337f03d5fe
https://github.com/qemu/qemu/commit/94da9aec2a50f0c82e6c60939275c0337f03d5fe
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/translate-all.c
Log Message:
-----------
translate-all: remove hole in PageDesc
Groundwork for supporting parallel TCG generation.
Move the hole to the end of the struct, so that a u32
field can be added there without bloating the struct.
Reviewed-by: Richard Henderson <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: ae5486e273a4e368515a963a6d0076e20453eb72
https://github.com/qemu/qemu/commit/ae5486e273a4e368515a963a6d0076e20453eb72
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/translate-all.c
Log Message:
-----------
translate-all: work page-by-page in tb_invalidate_phys_range_1
So that we pass a same-page range to tb_invalidate_phys_page_range,
instead of always passing an end address that could be on a different
page.
As discussed with Peter Maydell on the list [1], tb_invalidate_phys_page_range
doesn't actually do much with 'end', which explains why we have never
hit a bug despite going against what the comment on top of
tb_invalidate_phys_page_range requires:
> * Invalidate all TBs which intersect with the target physical address range
> * [start;end[. NOTE: start and end must refer to the *same* physical page.
The appended honours the comment, which avoids confusion.
While at it, rework the loop into a for loop, which is less error prone
(e.g. "continue" won't result in an infinite loop).
[1] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg09165.html
Reviewed-by: Richard Henderson <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: 45c73de594414904b0d6a7ade70fb4514d35f79c
https://github.com/qemu/qemu/commit/45c73de594414904b0d6a7ade70fb4514d35f79c
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/translate-all.c
Log Message:
-----------
translate-all: move tb_invalidate_phys_page_range up in the file
This greatly simplifies next commit's diff.
Reviewed-by: Richard Henderson <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: 0b5c91f74f3c83a36f37740969df8c775c997e69
https://github.com/qemu/qemu/commit/0b5c91f74f3c83a36f37740969df8c775c997e69
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/translate-all.c
M accel/tcg/translate-all.h
M include/exec/exec-all.h
Log Message:
-----------
translate-all: use per-page locking in !user-mode
Groundwork for supporting parallel TCG generation.
Instead of using a global lock (tb_lock) to protect changes
to pages, use fine-grained, per-page locks in !user-mode.
User-mode stays with mmap_lock.
Sometimes changes need to happen atomically on more than one
page (e.g. when a TB that spans across two pages is
added/invalidated, or when a range of pages is invalidated).
We therefore introduce struct page_collection, which helps
us keep track of a set of pages that have been locked in
the appropriate locking order (i.e. by ascending page index).
This commit first introduces the structs and the function helpers,
to then convert the calling code to use per-page locking. Note
that tb_lock is not removed yet.
While at it, rename tb_alloc_page to tb_page_add, which pairs with
tb_page_remove.
Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: 6d9abf85d538731ccff25fc29d7fa938115b1a80
https://github.com/qemu/qemu/commit/6d9abf85d538731ccff25fc29d7fa938115b1a80
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/translate-all.c
Log Message:
-----------
translate-all: add page_locked assertions
This is only compiled under CONFIG_DEBUG_TCG to avoid
bloating the binary.
In user-mode, assert_page_locked is equivalent to assert_mmap_lock.
Note: There are some tb_lock assertions left that will be
removed by later patches.
Reviewed-by: Richard Henderson <address@hidden>
Suggested-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: faa9372c07d062eb01f9da72e3f6c0f32efffca7
https://github.com/qemu/qemu/commit/faa9372c07d062eb01f9da72e3f6c0f32efffca7
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/cpu-exec.c
M accel/tcg/translate-all.c
M include/exec/exec-all.h
Log Message:
-----------
translate-all: introduce assert_no_pages_locked
The appended adds assertions to make sure we do not longjmp with page
locks held. Note that user-mode has nothing to check, since page_locks
are !user-mode only.
Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: 95590e24af11236ef334f6bc3e2b71404a790ddb
https://github.com/qemu/qemu/commit/95590e24af11236ef334f6bc3e2b71404a790ddb
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/cpu-exec.c
M accel/tcg/translate-all.c
M docs/devel/multi-thread-tcg.txt
Log Message:
-----------
translate-all: discard TB when tb_link_page returns an existing matching TB
Use the recently-gained QHT feature of returning the matching TB if it
already exists. This allows us to get rid of the lookup we perform
right after acquiring tb_lock.
Suggested-by: Richard Henderson <address@hidden>
Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: 194125e3ebd553acb02aaf3797a4f0387493fe94
https://github.com/qemu/qemu/commit/194125e3ebd553acb02aaf3797a4f0387493fe94
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/cpu-exec.c
M accel/tcg/translate-all.c
M docs/devel/multi-thread-tcg.txt
M include/exec/exec-all.h
Log Message:
-----------
translate-all: protect TB jumps with a per-destination-TB lock
This applies to both user-mode and !user-mode emulation.
Instead of relying on a global lock, protect the list of incoming
jumps with tb->jmp_lock. This lock also protects tb->cflags,
so update all tb->cflags readers outside tb->jmp_lock to use
atomic reads via tb_cflags().
In order to find the destination TB (and therefore its jmp_lock)
from the origin TB, we introduce tb->jmp_dest[].
I considered not using a linked list of jumps, which simplifies
code and makes the struct smaller. However, it unnecessarily increases
memory usage, which results in a performance decrease. See for
instance these numbers booting+shutting down debian-arm:
Time (s) Rel. err (%) Abs. err (s) Rel. slowdown (%)
------------------------------------------------------------------------------
before 20.88 0.74 0.154512 0.
after 20.81 0.38 0.079078 -0.33524904
GTree 21.02 0.28 0.058856 0.67049808
GHashTable + xxhash 21.63 1.08 0.233604 3.5919540
Using a hash table or a binary tree to keep track of the jumps
doesn't really pay off, not only due to the increased memory usage,
but also because most TBs have only 0 or 1 jumps to them. The maximum
number of jumps when booting debian-arm that I measured is 35, but
as we can see in the histogram below a TB with that many incoming jumps
is extremely rare; the average TB has 0.80 incoming jumps.
n_jumps: 379208; avg jumps/tb: 0.801099
dist: [0.0,1.0)|▄█▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁ ▁▁▁ ▁▁▁ ▁|[34.0,35.0]
Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: b7542f7fe8f879b7b1e74f5fbd36b5746dbb6712
https://github.com/qemu/qemu/commit/b7542f7fe8f879b7b1e74f5fbd36b5746dbb6712
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/cputlb.c
Log Message:
-----------
cputlb: remove tb_lock from tlb_flush functions
The acquisition of tb_lock was added when the async tlb_flush
was introduced in e3b9ca810 ("cputlb: introduce tlb_flush_* async work.")
tb_lock was there to allow us to do memset() on the tb_jmp_cache's.
However, since f3ced3c5928 ("tcg: consistently access cpu->tb_jmp_cache
atomically") all accesses to tb_jmp_cache are atomic, so tb_lock
is not needed here. Get rid of it.
Reviewed-by: Alex Bennée <address@hidden>
Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: 705ad1ff0ce264475cb4c9a3aa31ba94a04869fe
https://github.com/qemu/qemu/commit/705ad1ff0ce264475cb4c9a3aa31ba94a04869fe
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/translate-all.c
Log Message:
-----------
translate-all: remove tb_lock mention from cpu_restore_state_from_tb
tb_lock was needed when the function did retranslation. However,
since fca8a500d519 ("tcg: Save insn data and use it in
cpu_restore_state_from_tb") we don't do retranslation.
Get rid of the comment.
Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: 0ac20318ce16f4de288969b2007ef5a654176058
https://github.com/qemu/qemu/commit/0ac20318ce16f4de288969b2007ef5a654176058
Author: Emilio G. Cota <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M accel/tcg/cpu-exec.c
M accel/tcg/translate-all.c
M accel/tcg/translate-all.h
M docs/devel/multi-thread-tcg.txt
M exec.c
M include/exec/cpu-common.h
M include/exec/exec-all.h
M include/exec/memory-internal.h
M include/exec/tb-context.h
M linux-user/main.c
M tcg/tcg.h
Log Message:
-----------
tcg: remove tb_lock
Use mmap_lock in user-mode to protect TCG state and the page descriptors.
In !user-mode, each vCPU has its own TCG state, so no locks needed.
Per-page locks are used to protect the page descriptors.
Per-TB locks are used in both modes to protect TB jumps.
Some notes:
- tb_lock is removed from notdirty_mem_write by passing a
locked page_collection to tb_invalidate_phys_page_fast.
- tcg_tb_lookup/remove/insert/etc have their own internal lock(s),
so there is no need to further serialize access to them.
- do_tb_flush is run in a safe async context, meaning no other
vCPU threads are running. Therefore acquiring mmap_lock there
is just to please tools such as thread sanitizer.
- Not visible in the diff, but tb_invalidate_phys_page already
has an assert_memory_lock.
- cpu_io_recompile is !user-only, so no mmap_lock there.
- Added mmap_unlock()'s before all siglongjmp's that could
be called in user-mode while mmap_lock is held.
+ Added an assert for !have_mmap_lock() after returning from
the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic.
Performance numbers before/after:
Host: AMD Opteron(tm) Processor 6376
ubuntu 17.04 ppc64 bootup+shutdown time
700 +-+--+----+------+------------+-----------+------------*--+-+
| + + + + + *B |
| before ***B*** ** * |
|tb lock removal ###D### *** |
600 +-+ *** +-+
| ** # |
| *B* #D |
| *** * ## |
500 +-+ *** ### +-+
| * *** ### |
| *B* # ## |
| ** * #D# |
400 +-+ ** ## +-+
| ** ### |
| ** ## |
| ** # ## |
300 +-+ * B* #D# +-+
| B *** ### |
| * ** #### |
| * *** ### |
200 +-+ B *B #D# +-+
| #B* * ## # |
| #* ## |
| + D##D# + + + + |
100 +-+--+----+------+------------+-----------+------------+--+-+
1 8 16 Guest CPUs 48 64
png: https://imgur.com/HwmBHXe
debian jessie aarch64 bootup+shutdown time
90 +-+--+-----+-----+------------+------------+------------+--+-+
| + + + + + + |
| before ***B*** B |
80 +tb lock removal ###D### **D +-+
| **### |
| **## |
70 +-+ ** # +-+
| ** ## |
| ** # |
60 +-+ *B ## +-+
| ** ## |
| *** #D |
50 +-+ *** ## +-+
| * ** ### |
| **B* ### |
40 +-+ **** # ## +-+
| **** #D# |
| ***B** ### |
30 +-+ B***B** #### +-+
| B * * # ### |
| B ###D# |
20 +-+ D ##D## +-+
| D# |
| + + + + + + |
10 +-+--+-----+-----+------------+------------+------------+--+-+
1 8 16 Guest CPUs 48 64
png: https://imgur.com/iGpGFtv
The gains are high for 4-8 CPUs. Beyond that point, however, unrelated
lock contention significantly hurts scalability.
Reviewed-by: Richard Henderson <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: 9f754620651d3432114f4bb89c7f12cbea814b3e
https://github.com/qemu/qemu/commit/9f754620651d3432114f4bb89c7f12cbea814b3e
Author: Richard Henderson <address@hidden>
Date: 2018-06-15 (Fri, 15 Jun 2018)
Changed paths:
M tcg/aarch64/tcg-target.inc.c
M tcg/arm/tcg-target.inc.c
M tcg/i386/tcg-target.inc.c
M tcg/mips/tcg-target.inc.c
M tcg/ppc/tcg-target.inc.c
M tcg/s390/tcg-target.inc.c
M tcg/sparc/tcg-target.inc.c
M tcg/tcg.c
M tcg/tcg.h
M tcg/tci/tcg-target.inc.c
Log Message:
-----------
tcg: Reduce max TB opcode count
Also, assert that we don't overflow any of two different offsets into
the TB. Both unwind and goto_tb both record a uint16_t for later use.
This fixes an arm-softmmu test case utilizing NEON in which there is
a TB generated that runs to 7800 opcodes, and compiles to 96k on an
x86_64 host. This overflows the 16-bit offset in which we record the
goto_tb reset offset. Because of that overflow, we install a jump
destination that goes to neverland. Boom.
With this reduced op count, the same TB compiles to about 48k for
aarch64, ppc64le, and x86_64 hosts, and neither assertion fires.
Cc: address@hidden
Reported-by: "Jason A. Donenfeld" <address@hidden>
Reviewed-by: Philippe Mathieu-Daudé <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Commit: 33836a731562e3d07b3a83f26e81c6b1482d216c
https://github.com/qemu/qemu/commit/33836a731562e3d07b3a83f26e81c6b1482d216c
Author: Peter Maydell <address@hidden>
Date: 2018-06-21 (Thu, 21 Jun 2018)
Changed paths:
M accel/tcg/cpu-exec.c
M accel/tcg/cputlb.c
M accel/tcg/translate-all.c
M accel/tcg/translate-all.h
M docs/devel/multi-thread-tcg.txt
M exec.c
M include/exec/cpu-common.h
M include/exec/exec-all.h
M include/exec/memory-internal.h
M include/exec/tb-context.h
M include/qemu/qht.h
M linux-user/main.c
M tcg/aarch64/tcg-target.inc.c
M tcg/arm/tcg-target.inc.c
M tcg/i386/tcg-target.inc.c
M tcg/mips/tcg-target.inc.c
M tcg/ppc/tcg-target.inc.c
M tcg/s390/tcg-target.inc.c
M tcg/sparc/tcg-target.inc.c
M tcg/tcg.c
M tcg/tcg.h
M tcg/tci/tcg-target.inc.c
M tests/qht-bench.c
M tests/test-qht.c
M util/qht.c
Log Message:
-----------
Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20180615' into staging
TCG patch queue:
Workaround macos assembler lossage.
Eliminate tb_lock.
Fix TB code generation overflow.
# gpg: Signature made Fri 15 Jun 2018 20:40:56 BST
# gpg: using RSA key 64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <address@hidden>"
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F
* remotes/rth/tags/pull-tcg-20180615:
tcg: Reduce max TB opcode count
tcg: remove tb_lock
translate-all: remove tb_lock mention from cpu_restore_state_from_tb
cputlb: remove tb_lock from tlb_flush functions
translate-all: protect TB jumps with a per-destination-TB lock
translate-all: discard TB when tb_link_page returns an existing matching TB
translate-all: introduce assert_no_pages_locked
translate-all: add page_locked assertions
translate-all: use per-page locking in !user-mode
translate-all: move tb_invalidate_phys_page_range up in the file
translate-all: work page-by-page in tb_invalidate_phys_range_1
translate-all: remove hole in PageDesc
translate-all: make l1_map lockless
translate-all: iterate over TBs in a page with PAGE_FOR_EACH_TB
tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx
tcg: track TBs with per-region BST's
qht: return existing entry when qht_insert fails
qht: require a default comparison function
tcg/i386: Use byte form of xgetbv instruction
Signed-off-by: Peter Maydell <address@hidden>
Compare: https://github.com/qemu/qemu/compare/46012db66699...33836a731562
**NOTE:** This service been marked for deprecation:
https://developer.github.com/changes/2018-04-25-github-services-deprecation/
Functionality will be removed from GitHub.com on January 31st, 2019.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Qemu-commits] [qemu/qemu] 101924: tcg/i386: Use byte form of xgetbv instruction,
GitHub <=