qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] fix guest physical bits to match host, to go be


From: Eduardo Habkost
Subject: Re: [Qemu-devel] [PATCH] fix guest physical bits to match host, to go beyond 1TB guests
Date: Wed, 17 Jul 2013 10:39:50 -0300
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Jul 17, 2013 at 10:09:01AM +0200, Paolo Bonzini wrote:
> Il 16/07/2013 21:42, Eduardo Habkost ha scritto:
> > On Tue, Jul 16, 2013 at 09:24:30PM +0200, Paolo Bonzini wrote:
> >> Il 16/07/2013 20:11, Eduardo Habkost ha scritto:
> >>> For physical bit size, what about extending it in a backwards-compatible
> >>> way? Something like this:
> >>>
> >>>     *eax = 0x0003000; /* 48 bits virtual */
> >>>     if (ram_size < 1TB) {
> >>>         physical_size = 40; /* Keeping backwards compatibility */
> >>>     } else if (ram_size < 4TB) {
> >>>         physical_size = 42;
> >>
> >> Why not go straight up to 44?
> > 
> > I simply trusted the comment saying: "The physical address space is
> > limited to 42 bits in exec.c", and assumed we had a 42-bit limit
> > somewhere else.
> 
> Yeah, that's obsolete.  We now can go up to 64 (but actually only
> support 52 because that's what Intel says will be the limit----4PB RAM
> should be good for everyone, as Bill Gates used to say).
> 
> So far Intel has been upgrading the physical RAM size in x16 steps
> (MAXPHYADDR was 36, then 40, then 44).  MAXPHYADDR is how Intel calls
> what you wrote as physical_size.

Then if ram_size is too large, we could round it up to a multiple of 4?

> 
> >     if (ram_size < 1TB) {
> >         physical_size = 40; /* Keeping backwards compatibility */
> >     } else {
> >         physical_size = msb(ram_size);
> >     }
> >     if (supported_host_physical_size() < physical_size) {
> >         abort();
> >     }
> 
> Not enough because there are processors with 36.  So perhaps, putting
> together both of your ideas:
> 
>      if (supported_host_physical_size() < msb(ram_size)) {
>          abort();
>      }

What if the host max physical size is smaller than the MAXPHYADDR we're
setting for the VM (but still larger than msb(ram_size))? Will the CPU
or KVM complain, or will it work?

In other words, do we need a check for
  (supported_host_physical_size() < physical_size)
below, as well?

>      if (ram_size < 64GB && !some_compat_prop) {
>          physical_size = 36;
>      } else if (ram_size < 1TB) {
>          physical_size = 40;
>      } else {
>          physical_size = 44;
>      }
> 
> What do you think?

Why stop at 44? What if ram_size is larger than (1 << 44)?

We must never start a VM with physical_size > msb(ram_size), right?

But I am not sure if we should we simply increase physical_size
automatically, or abort on some cases where we physical_size ends up
being too small. (I believe we should simply increase it automatically)

> 
> >> This makes sense too.  Though the best would be of course to use CPUID
> >> values coming from the real processors, and only using 40 for backwards
> >> compatibility.
> > 
> > We can't use the values coming from the real processors directly, or
> > we will break live migration.
> 
> I said real processors, not host processors. :)

So, you mean setting per-cpu-model values?

> 
> So a Core 2 has MAXPHYADDR=36, Nehalem has IIRC 40, Sandy Bridge has 44,
> and so on.

That would work as well (and on pc-1.5 and older we could keep
physical_size=40 or use the above algorithm). But: I wonder if it would
be usable. If somebody is using "-cpu Nehalem -m 8T", I believe it would
make sense to increase the physical address size automatically instead
of aborting. What do you think?

> 
> > If we sent those CPUID bits as part of the migration stream, it would
> > make it a little safer, but then it would be impossible for libvirt to
> > tell if it is really possible to migrate from one host to another.
> 
> The libvirt problem still remains, since libvirt currently doesn't know
> the MAXPHYADDRs and would have to learn them.
> 
> I guess the above "artificial" computation of MAXPHYADDR is enough.

If the VM is already broken and has MAXPHYADDR < msb(ram_size), I don't
worry about ABI stability, because those VMs are not even supposed to be
running properly. If somebody have such a broken VM running, it seems
better to make the MAXPHYADDR bits suddenly change to a reasonable value
(so that MAXPHYADDR >= msb(ram_size)) during live-migration than
aborting migration, or keeping the broken value.

So if we use an algorithm that always increase MAXPHYADDR automatically
(breaking the ABI in the cases where the current is already broken), we
are not going to abort migration unless the host really can't run our
guest (this seems to be a less serious problem than aborting because the
VM configuration is broken and needs to be manually adjusted).

-- 
Eduardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]