qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Enablig DLPAR capacity on QEMU pSeries


From: Erlon Cruz
Subject: Re: [Qemu-devel] Enablig DLPAR capacity on QEMU pSeries
Date: Fri, 5 Oct 2012 11:08:40 -0300

Hi all,

Just retaking the discussion of some days ago. After some discussions here and considering the suggestions in this thread, we got 3 main ideas for the DLPAR on QEMU/KVM. 

1 - introduce a new device/driver that will be used to communicate changes in the DT to the guest
   + make thinks cleaner as there is no need to take any guest action to have anything added
   + no userspace daemon needed
   - we would have to add a new driver. Not shure whether this is a bad thing

2 - create a kernel task to listen to host changes 
   + make thinks cleaner as there is no need to take any guest action to have anything added
   + no userspace daemon needed
    - that might require some changes on the kernel core

3 - use a model as close as provide by PHYP: the idea here is to be able use the same proprietary daemons used in the guest. As this daemon calls an open tool(drmgr) to trigger DR events in in the kernel, it will also be possible to use qemu-ga as an alternative  to the daemon.
    + this make thinks easier as we don't have to make any changes on guest kernel
    + supported PowerVM tools will be able to work on this model
    -  this model (DynamicRM) will be is probably deprecated in the nearly future

So far we are going with the third option. Any regards about that let us know.

Kind Regards,
Erlon


On Thu, Sep 13, 2012 at 6:45 PM, Benjamin Herrenschmidt <address@hidden> wrote:
On Thu, 2012-09-13 at 12:15 -0300, Erlon Cruz wrote:

> >> > lack of experience in the internals of the arch we would like you guys
> >> > to give us some design directions
> >> > and confirm if we going in the right direction. Our first idea is:
> >> >
> >> >      1 - to patch 'spapr.c' so it can dynamically insert/remove basic
> >> > items into the device tree.
> >>
> >> What exactly would you like to patch into it? We already do have support
> >> for dynamic dt creation with the spapr target.
> >
> > No we don't. We don't have the necessary bits and pieces to pass the DT
> > updates down to the guest. PAPR defines a mechanism using RTAS calls
> > which we need to implement, but there are some issues remaining:
>
> Do we need any patching on SLOF to make this possible?

For adding the RTAS calls, no. Under qemu, RTAS is currently entirely
provided by qemu itself (the RTAS blob that is carried around through
SLOF etc... is just a little 5 instructions wrapper that calls a hidden
hcall).

> >  - We don't have a way to "initiate" a DLPAR operation. This is
> > currently done by proprietary tools that communicate with the HMC. We
> > want to invent some kind of hotplug "interrupt" (using existing RTAS
> > event facilities). All it needs to do is indicate the DT path (ie.
> > connector) where something is to be plugged to or unplugged, which can
> > then trigger the relevant configure-connector calls to retrieve the DT
> > bits.
>
> I think that a device/guest driver will work for this purpose. It will
> get interrupted when something on DT changes and trigger the kernel
> routines that in the actual implementation are called by those
> proprietary tool.

Right. My idea was to remain generally consistent with PAPR and use some
kind of existing RTAS even interrupt facility and extend it. I should
try to poke some of the IBM folks in charge of PAPR to see if they are
interested in actually architecting such a mechanism.

> >  - We have a problem with PCI. Currently, the content of the PCI
> > bus(ses) is discovered by SLOF running inside the guest. Not by qemu.
> > It's SLOF that assigns the BARs and create the device-tree nodes for the
> > various PCI devices. However, with hotplug, the guest expects to get
> > fully populated DT nodes for hotplugged PCI devices and fully assigned
> > BARS. Under pHyp that works because under the hood, RTAS contains an OFW
> > implementation which does all the assignment before passing the stuff to
> > the OS, but under qemu, RTAS is actually in qemu. This means we'll
> > probably have to move back the PCI device node creation and resource
> > assignment to qemu (like it was in the very early versions of the spapr
> > support).
>
> AFIK in the first versions of spapr there was no PCI support, right?

Well, we added PCI pretty quickly.

> So I'm guessing that you refer to the first implementation of PCI.
> Would we have to remove the PCI discovering functions from SLOF? I
> have no idea on how to code into SLOF.

We would have to change them at least, as we still want SLOF to do the
driver matching part at least. I can help with SLOF and we have some
folks assigned to it as well in IBM, so don't worry too much about that
part. I think the first step is to get a proof of concept using PAPR VIO
first, which doesn't have that problem. Then we can look at the PCI
issues.

Cheers,
Ben.

> >> >      2 - create a host side device that will be used with a guest side
> >> > driver to perform guest side operations and communicate changes from
> >> > host to the guest (like DynamicRM does in PowerVM LPARs). We are not
> >>
> >> Why not just use hypercalls?
> >
> > Actually there are existing RTAS calls to use for the actual passing of
> > the device-tree bits, the problem is purely how to "initiate" an
> > operation to trigger the guest code that will then perform the
> > appropriate calls.
> > qemu-ga is an option. But I was thinking more along the lines of adding
> > some new RTAS events, maybe EPOW style, a bit like ACPI does.
> >
> >> > planning to use powerpc-tools and want to make resource management
> >> > transparent (i.e. no need to run daemons or userspace programs in the
> >> > guest, only this kernel driver).
> >> >      3 - create bindings to support adding/removal  ibmvscsi devices
> >> >      4 - create bindings to support adding/removal  ibmveth devices
> >> >      5 - create bindings to support adding/removal PCI devices
> >> >      6 - create bindings to support adding/removal of memory
> >
> > There's already large parts of the necessary bits using RTAS in the
> > kernel (in recent kernels that is, older stuff really needed it all done
> > in userspace). The trigger mostly is missing.
> >> This is going to be the hardest part. I don't think QEMU supports memory
> >> hotplug yet.
> >
> > Missing from the above list is also CPU hotplug.
> >
> >> >          - Do we need to do this the way PowerVM does? We have tested
> >> > virtio ballooning and it can works with a few endiannes corrections.
> >>
> >> I don't know how PowerVM works. But if normal ballooning is all you
> >> need, you should certainly just enable virtio-balloon.
> >
> > Does virtio-balloon needs endian fixes ? We though it was just working !
> > Feel free to submit patches :)
> >
> >> >      7 - create bindings to support adding/removal  CPUs
> >> >          - is SMP supported already? I tried to run SMP in a x86 host
> >> > and the guest stuck when SMP is enabled
> >>
> >> SMP should work just fine, yes. Where exactly does it get stuck?
> >
> > Right,it works fine as far as I can tell.
> >
> >> >          - would be possible to work on this without a P7 baremetal
> >> > machine?
> >>
> >> At least for device hotplug, it should be perfectly possible to use an
> >> old G5 with PR KVM. I haven't gotten around to patch all the pieces of
> >> the puzzle to make -M pseries work with PR KVM when it's running on top
> >> of pHyp yet, so that won't work.
> >>
> >> > We have a P7 8205-E6B, is that possible to kick PHYP out?
> >>
> >> Ben?
> >
> > Probably not. You need a 7R2.
> >
> >> > Any ideia on how much effort (time/people) the hole thing would take?
> >> > Any consideration about this is much appreciated :)
> >>
> >> Phew. It's hard to tell. Depends heavily on how good your people are :).
> >>
> >
> > Cheers,
> > Ben.
> >
> >
> >
>
> Cheers,
> Erlon





reply via email to

[Prev in Thread] Current Thread [Next in Thread]