qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 00/34] Multi Architecture System Emulation


From: Peter Crosthwaite
Subject: Re: [Qemu-devel] [RFC PATCH 00/34] Multi Architecture System Emulation
Date: Mon, 11 May 2015 01:21:01 -0700

On Mon, May 11, 2015 at 12:13 AM, Peter Maydell
<address@hidden> wrote:
> On 11 May 2015 at 07:29, Peter Crosthwaite <address@hidden> wrote:
>> This is target-multi, a system-mode build that can support multiple
>> cpu-types. Patches 1-3 are the main infrastructure. The hard part
>> is the per-target changes needed to get each arch into an includable
>> state.
>
> Interesting. This is something I'd thought we were still some way
> from being able to do :-)
>
>> The hardest part is what to do about bootloading. Currently each arch
>> has it's own architecture specific bootloading which may assume a
>> single architecture. I have applied some hacks to at least get this
>> RFC testable using a -kernel -firmware split but going forward being
>> able to associate an elf/image with a cpu explictitly needs to be
>> solved.
>
> My first thought would be to leave the -kernel/-firmware stuff as
> legacy (or at least with semantics defined by the board model in use)
> and have per-CPU QOM properties for setting up images for genuinely
> multi-CPU configs.
>

OK

>> For the implementation of this series, the trickiest part is cpu.h
>> inclusion management. There are now more than one cpu.h's and different
>> parts of the tree need a different include scheme. target-multi defines
>> it's own cpu.h which is bare minimum defs as needed by core code only.
>> target-foo/cpu.h are mostly the same but refactored to reuse common
>> code (with target-multi/cpu-head.h). Inclusion scheme goes something like
>> this (for the multi-arch build):
>>
>> 1: All obj-y modules include target-multi/cpu.h
>> 2: Core code includes no other cpu.h's
>> 3: target-foo/ implementation code includes target-foo/cpu.h
>> 4: System level code (e.g. mach models) can use multiple target-foo/cpu.h's
>>
>> Point 4 means that cpu.h's needs to be refactored to be able to include one
>> after the other. The interrupts for ARM and MB needed to be renamed to avoid
>> namespace collision. A few other defs needed multiple include guards, and
>> a few defs which where only for user mode are compiled out or relocated. No
>> attempt at support for multi-arch linux-user mode (if that even makes 
>> sense?).
>
> I don't think it does make much sense -- our linux-user code hardwires
> a lot of ABI details like size of 'long' and struct layouts. In any
> case we should probably leave it for later.
>
>> The env as handle by common code now needs to architecture-agnostic. The
>> MB and ARM envs are refactored to have CPU_COMMON as the first field(s)
>> allowing QOM-style pointer casts to/from a generic env which contains only
>> CPU_COMMON. Might need to lock down some struct packing for that but it
>> works for me so far.
>
> Have you managed to retain the "generated code passes around a pointer
> to an env which starts with the CPU specific fields"? We have the env
> structs the layout we do because it's a performance hit if the registers
> aren't a short distance away from the pointer...
>

OK, I knew there had to be a reason. So I guess the simplest
alternative is pad the env out so the arch-specific env sections are
the same length followed by a CPU_COMMON. A bit of union { struct {} }
stuffs might just do the trick although there will be some earthworks
on cpu.h.

>> The helper function namespace is going to be tricky. I haven't tackled the
>> problem just yet, but looking for ideas on how we can avoid prefacing all
>> helpers with arch prefixes to avoid link-time collisions because multiple
>> arches use the same helper names.
>>
>> A lowest common denomintor approach is taken on architecture specifics. E.g.
>> TARGET_LONG is 64-bit, and the address space sizes and NUM_MMU_MODES is set
>> to the maximum of all the supported arches.
>
> ...speaking of performance hits.
>
> I'm not sure you can do lowest-common-denominator for TARGET_PAGE_SIZE,
> incidentally. At minimum it will result in a perf hit for the CPUs with
> larger pages (because we end up taking the hugepage support paths in the
> cputlb.c code), and at worst TLB flushing in the target's helper routines
> might not take out the right pages. (I think ARM has some theoretical
> bugs here which we don't hit in practice; ARM already has to cope with
> a TARGET_PAGE_SIZE smaller than its usual pagesize, though.)
>

So I have gone for TARGET_PAGE_SIZE = 12 as the only initially
supported config. This will go a long way while we figure out mixing
page sizes on the core level. I chose to ignore the ARM 1k page size
thing as the code comment suggests it's a legacy thing anyway.

>> The remaining globally defined interfaces between core code and CPUs are
>> QOMified per-cpu (P2)
>>
>> Microblaze translation needs a change pattern to allow conversion to 64-bit
>> TARGET_LONG. Uses of TCGv need to be removed and explicited to 32-bit.
>
> Yeah, this will be a tedious job for the other targets (I had to do it
> for ARM when I added the AArch64 support).
>

It's very scriptable. I had it to a point where I could use vim s//cg
mode to turn it into and interactive conversion.

>> This RFC will serve as a reference as I send bits and piece to the respective
>> maintainers (many major subsystems are patched).
>>
>> No support for KVM, im not sure if a mix of TCG and KVM is supported even for
>> a single arch? (which would be prerequisite to MA KVM).
>
> You can build a single binary which supports both TCG and KVM for a
> particular architecture. You just can't swap back and forth between
> TCG and KVM at runtime. We should probably start by supporting KVM
> only on boards with a single CPU architecture. I don't think it's
> in-principle impossible to get a setup with 4 KVM CPUs and one
> TCG emulated CPUs to work, but it probably needs to wait til we've
> got multi-threaded TCG working before we even think about it.
>

OK.

>> Depends (not heavily) on my on-list disas QOMification. Test instructions
>> available on request. I have tested ARM & MB elfs handshaking through shared
>> memory and both printfing to the same UART (verifying system level
>> connectivity). -d in_asm works with the mix of disas arches comming out.
>
> Did you do any benchmarking to see whether the performance hits are
> noticeable in practice?
>

No, do you have any recommendations?

> Do you give each CPU its own codegen buffer? (I'm thinking that some
> of this might also be more easily done once multithreadded-TCG is
> complete, since that will properly split the datastructures.)
>

No, the approach taken here is everything is exactly the same as
existing SMP. My logic is we already have the core support in that
AArch64 SMP lets us runtime mix-and-match arches. E.g. there's nothing
stopping the bootloader putting one core in AA32 and the other in 64
leading to basically multi-arch. I just extend that to cross
target-foo boundaries with some code re-arrangement.

Regards,
Peter

> thanks
> -- PMM
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]