Re: [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with

From:	Kirill Batuzov
Subject:	Re: [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations
Date:	Tue, 21 Feb 2017 15:19:59 +0300 (MSK)
User-agent:	Alpine 2.11 (DEB 23 2013-08-11)

On Thu, 2 Feb 2017, Kirill Batuzov wrote:

> The goal of these patch series is to set up an infrastructure to emulate
> guest vector operations using host vector operations. Preliminary
> experiments show that simply translating loads and stores increases
> performance of x264 video codec by 10%. The performance of a gcc vectorized
> for loop increased 2x.
> 
> To be able to emulate guest vector operations using host vector operations,
> several things need to be done.
> 
> 1. Corresponding vector types should be added to TCG. These series add
> TCG_v128 and TCG_v64. I've made TCG_v64 a different type than TCG_i64
> because it usually needs to be allocated to different registers and
> supports different operations.
> 
> 2. Load/store operations for these new types need to be implemented.
> 
> 3. For seamless transition from current model to a new one we need to
> handle cases where memory occupied by global variable can be accessed via
> pointer to the CPUArchState structure. A very simple conservative alias
> analysis has been added to do it. This analysis tracks memory loads and
> stores that overlap with fields of CPUArchState and provides this
> information to the register allocator. The allocator then spills and
> reloads affected globals when needed.
> 
> 4. Allow overlapping globals. For scalar registers this is a rare case, and
> overlapping registers can ba handled as a single one (ah, al, ax, eax,
> rax). In ARM every Q-register consists of two D-register each consisting of
> two S-registers. Handling 4 S-registers as one because they are parts of
> the same Q-register is way too inefficient.
> 
> 5. Add new memory addressing mode to MMU code for large accesses and create
> needed helpers. Only 128-bit vectors have been handled for now.
> 
> 6. Create TCG opcodes for vector operations. Only addition has beed handled
> in these series. Each operation has a wrapper that checks if the backend
> supports the corresponding operation or not. In one case the vector opcode
> is generated, in the other the operation is emulated with scalar
> operations. The emulation code is generated inline for performance reasons
> (there is a huge performance difference between inline generation
> and calling a helper). As a positive side effect this will eventually allow
>  to merge similar emulation code for vector instructions from different
> frontends to target-independent implementation.
> 
> 7. Use new operations in the frontend (ARM was used in these series).
> 
> 8. Support new operations in the backend (x86_64 was used in these series).
> 
> For experiments I have used ARM guest on x86_64 host. I wanted some pair of
> different architectures with vector extensions both. ARM and x86_64 pair
> fits well.
> 
> v1 -> v2:
>  - represent v128 type with smaller types when it is not supported by the host
>  - detect AVX support and use AVX instructions when available
>  - tcg/README updated
>  - generate two v64 adds instead of one v128 when applicable
>  - rebased to newer master
>  - overlap detection for temps added (it needs to be explicitly called from
>    <arch>_translate_init)
>  - the stack is used to temporary store 128 bit variables to memory
>    (instead of the TCGContext field)
> 
> v2 -> v2.1
>  - automatic build failure fixed
> 
> Outstanding issues:
>  - qemu_ld_v128 and qemu_st_v128 do not generate fallback code if the host
>    does not support 128 bit registers. The reason is that I do not know how to
>    handle the host/guest different endianness (whether do we swap only bytes
>    in elements or whole vectors?). Different targets seem to have different
>    ideas on how this should be done.
>

Ping?

-- 
Kirill

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH v2.1 10/21] target/arm: use vector opcode to handle vadd.<size> instruction, (continued)
- [Qemu-devel] [PATCH v2.1 13/21] tcg/i386: support remaining vector addition operations, Kirill Batuzov, 2017/02/02
- [Qemu-devel] [PATCH v2.1 21/21] tcg/README: update README to include information about vector opcodes, Kirill Batuzov, 2017/02/02
- [Qemu-devel] [PATCH v2.1 11/21] tcg/i386: add support for vector opcodes, Kirill Batuzov, 2017/02/02
- [Qemu-devel] [PATCH v2.1 08/21] tcg: add vector addition operations, Kirill Batuzov, 2017/02/02
- [Qemu-devel] [PATCH v2.1 15/21] target/aarch64: do not check for non-existent TCGMemOp, Kirill Batuzov, 2017/02/02
- [Qemu-devel] [PATCH v2.1 12/21] tcg/i386: support 64-bit vector operations, Kirill Batuzov, 2017/02/02
- [Qemu-devel] [PATCH v2.1 20/21] target/arm: load two consecutive 64-bits vector regs as a 128-bit vector reg, Kirill Batuzov, 2017/02/02
- [Qemu-devel] [PATCH v2.1 05/21] tcg: add simple alias analysis, Kirill Batuzov, 2017/02/02
- Re: [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations, no-reply, 2017/02/02
- Re: [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations, Kirill Batuzov <=

Prev by Date: Re: [Qemu-devel] [PATCH 2/5] migration/vmstate: split up vmstate_base_addr
Next by Date: Re: [Qemu-devel] [PATCH v8 0/8] Add support for VM Generation ID
Previous by thread: Re: [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations
Next by thread: [Qemu-devel] [PATCH v3] q35: Improve sample configuration files
Index(es):
- Date
- Thread