[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 0/6] trace: [tcg] Optimize per-vCPU tracing s
From: |
Richard Henderson |
Subject: |
Re: [Qemu-devel] [PATCH v3 0/6] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches |
Date: |
Fri, 23 Dec 2016 12:09:24 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 |
On 12/23/2016 10:51 AM, Lluís Vilanova wrote:
>> On 12/22/2016 10:35 AM, Lluís Vilanova wrote:
>>> To handle both issues, this series replicates the shared physical TB cache,
>>> creating a separate physical TB cache for every combination of event states
>>> (those with the 'vcpu' and 'tcg' properties). Then, all vCPUs tracing the
>>> same
>>> events will use the same physical TB cache.
>
>> Why do we need to "split the physical TB cache" as opposed to simply
>> including
>> the trace state into the TB hash function?
>
> Mmmm, that's an interesting alternative I did not consider. Are you aiming at
> minimizing the changes, or do you also think it would be more efficient?
I suspect that it will be more efficient.
> The dynamic tracing state would then be an arbitrarily long bitmap (defined by
> the number of events with the 'vcpu' property), so I'm not sure how to fit it
> into the hashing function with minimal collisions (the bitmap is now limited
> to
> an unsigned long to use it as an index to the TB cache "matrix").
You could consider that index a unique identifier for the tracing state, and
then only compare and hash that integer.
> The other drawback I see is that then it would also take longer to compute the
> hashing function, instead of the simpler array indexing. As a benefit,
> workloads
> with a high frequency of TB-flushing operations might be a bit faster (there
> would be a single QHT).
I don't see adding one more integer to the hashing function to be significant
at all. Certainly not the 15% that you describe in your cover letter.
> If someone can provide me the code for the modified hash lookup function to
> account for the trace dstate bitmap contents, I will integrate it and measure
> if
> there is any important change in performance.
Something like the following should do it. There are two /* cpu->??? */
markers that would need to be filled in.
If you can reduce the tracing identifier to 8 bits, that would be excellent.
I've been wanting to make some other changes to TB hashing, and that would fit
in well with a second "flags" value.
r~
z
Description: Text document
- [Qemu-devel] [PATCH v3 0/6] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches, Lluís Vilanova, 2016/12/22
- [Qemu-devel] [PATCH v3 2/6] trace: Make trace_get_vcpu_event_count() inlinable, Lluís Vilanova, 2016/12/22
- [Qemu-devel] [PATCH v3 6/6] trace: [tcg, trivial] Re-align generated code, Lluís Vilanova, 2016/12/22
- [Qemu-devel] [PATCH v3 4/6] exec: [tcg] Switch physical TB cache based on vCPU tracing state, Lluís Vilanova, 2016/12/22
- [Qemu-devel] [PATCH v3 1/6] exec: [tcg] Refactor flush of per-CPU virtual TB cache, Lluís Vilanova, 2016/12/22
- [Qemu-devel] [PATCH v3 3/6] exec: [tcg] Use multiple physical TB caches, Lluís Vilanova, 2016/12/22
- [Qemu-devel] [PATCH v3 5/6] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events, Lluís Vilanova, 2016/12/22
- Re: [Qemu-devel] [PATCH v3 0/6] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches, Richard Henderson, 2016/12/23